Transfer learning or De Novo training with StyleGAN on google colab

I have had some success doing transfer learning on a new dataset with pretraining networks using stylegan. Here is an example of using the official pretrained bedroom network on new data pretrained_sG.png

Installing and importing needed packages

(Note this notebook builds off of the excellent work of: https://www.analyticsvidhya.com/blog/2020/12/training-stylegan-using-transfer-learning-on-a-custom-dataset-in-google-colaboratory/ however, expands a bit on the transfer learning)

This notebook assumes you have previously created your dataset using stylegans dataset tool. If not, please see here:

First, we need to mount google drive (for the saving of checkpoints and preview images). Then we will install and import the needed packages for styleGAN (v1).

In [ ]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True) ##Import and mount drive, if you are not on COLAB omit this cell
Mounted at /content/drive
In [ ]:
%tensorflow_version 1.x ##In colab this command changes the tensorflow version. If not in COLAB omit this line, and ensure your eniviroment has TF <= 1.15 installed
import tensorflow
print(tensorflow.__version__)
`%tensorflow_version` only switches the major version: 1.x or 2.x.
You set: `1.x ##In colab this command changes the tensorflow version. If not in COLAB omit this line, and ensure your eniviroment has TF <= 1.15 installed`. This will be interpreted as: `1.x`.


TensorFlow 1.x selected.
1.15.2
In [ ]:
!git clone https://github.com/NVlabs/stylegan.git #Now clone stylegan (V1) from github
!ls /content/stylegan/
Cloning into 'stylegan'...
remote: Enumerating objects: 86, done.
remote: Total 86 (delta 0), reused 0 (delta 0), pack-reused 86
Unpacking objects: 100% (86/86), done.
config.py	     LICENSE.txt	    run_metrics.py
dataset_tool.py      metrics		    stylegan-teaser.png
dnnlib		     pretrained_example.py  training
generate_figures.py  README.md		    train.py
In [ ]:
import sys
sys.path.insert(0, "/content/stylegan") #add stylegan folder to path
import dnnlib #Import DNN
In [ ]:
#Other imports for later functions
import os
from tqdm import tqdm
import glob
import cv2
from PIL import Image
from IPython.display import clear_output

Setting up the stylegan scripts

Now, we need to set-up the stylegan scripts for custom learning.

config.py

First, I recommend opening /stylegan/config.py. Here, we need to change line number 13: From:

result_dir = 'results'

To:

result_dir = '/content/drive/MyDrive'

This ensures checkpoints and previews are saved to your drive. This means when your GPU instance times out, you can reuse your pre-trained model later. Alternatively, you can change this to a subfolder of your choosing.

train.py

Next, open up /stylegan/train.py. A few lines need to change here. First, navigate to line 37 and change:
From:

desc += '-ffhq';     dataset = EasyDict(tfrecord_dir='ffhq');                 train.mirror_augment = True

To:

desc += '-/path/to/dataset';     dataset = EasyDict(tfrecord_dir='/path/to/dataset');                 train.mirror_augment = True

This points stylegan to your custom dataset. Next, navigate to lines 45-50 and change: From:

 # Number of GPUs.
    #desc += '-1gpu'; submit_config.num_gpus = 1; sched.minibatch_base = 4; sched.minibatch_dict = {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4}
    #desc += '-2gpu'; submit_config.num_gpus = 2; sched.minibatch_base = 8; sched.minibatch_dict = {4: 256, 8: 256, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8}
    #desc += '-4gpu'; submit_config.num_gpus = 4; sched.minibatch_base = 16; sched.minibatch_dict = {4: 512, 8: 256, 16: 128, 32: 64, 64: 32, 128: 16}
    desc += '-8gpu'; submit_config.num_gpus = 8; sched.minibatch_base = 32; sched.minibatch_dict = {4: 512, 8: 256, 16: 128, 32: 64, 64: 32}

To:

 # Number of GPUs.
    desc += '-1gpu'; submit_config.num_gpus = 1; sched.minibatch_base = 4; sched.minibatch_dict = {4: 128, 8: 128, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8, 512: 4}
    #desc += '-2gpu'; submit_config.num_gpus = 2; sched.minibatch_base = 8; sched.minibatch_dict = {4: 256, 8: 256, 16: 128, 32: 64, 64: 32, 128: 16, 256: 8}
    #desc += '-4gpu'; submit_config.num_gpus = 4; sched.minibatch_base = 16; sched.minibatch_dict = {4: 512, 8: 256, 16: 128, 32: 64, 64: 32, 128: 16}
    #desc += '-8gpu'; submit_config.num_gpus = 8; sched.minibatch_base = 32; sched.minibatch_dict = {4: 512, 8: 256, 16: 128, 32: 64, 64: 32}

If you are training on Colab, this is needed to restrict stylegan to the single GPU instances provided. Alternatively, if you are training on your own machine, feel free to change this to the number of GPUs available. Optionally you can also change line 52: train.total_kimg = 25000 Increasing or decreasing the number. Changing this will change the length of training while also changing the scheduling of the resolution, and therefore, the result.

frechet_inception_distance.py

Next, we need to open https://drive.google.com/drive/folders/1MASQyN5m0voPcx7-9K0r5gObhvvPups7 and navigate to the metrics subfolder. Here save/download "inception_v3_features.pkl" to your google drive. Now navigate to and open /content/stylegan/metrics/frechet_inception_distance.py.
Change line 29 from:

inception = misc.load_pkl('https://drive.google.com/uc?id=1MzTY44rLToO5APn8TZmfR7_ENSe5aZUn')

to:

inception = misc.load_pkl('/path/to/inception_pkl/')

pointing the line towards the inception_features pkl on your google drive. Failure to do this will result in the script trying to download directly from the old drive link. Often this creates errors as the original drive link is over the download limit on google drive. Copying the file to your drive skirts around this issue.

That's all you really need to do if you are training from scratch. However, read on to use pre-trained networks or resume training

Transfer learning or resuming a previous training session

I have had some success using pre-trained network weights to speed up training. To do this, we need to select a pre-trained network that fits with your dataset. Generally, I have selected a pre-trained network that is:

  • The same size as your dataset
  • Generally similar features/colours/structure
  • Trained up to a high K-img

You can choose from the official pre-trained networks here: https://drive.google.com/drive/folders/1MASQyN5m0voPcx7-9K0r5gObhvvPups7 or the anime style gan pretrained network found here: https://www.gwern.net/Faces#anime-faces.
Simply save / upload the pre-trained network of choice to your google drive.

training_loop.py

Next, we need to tell stylegan to use these pre-trained weights. To do so, we need to open /stylegan/training/training_loop.py. First modify line 136, from:

resume_run_id           = None,     # Run ID or network pkl to resume training from, None = start from scratch.

To:

resume_run_id           = 'path/to/pretrain_.pkl',     # Run ID or network pkl to resume training from, None = start from scratch.

If you are resuming from a previous run, simply change this to the latest network snapshot checkpoint! Note: you do not need to change the line immediately below this titled 'resume_snapshot'.
Next change line 138, from:

 resume_kimg             = 0.0,      # Assumed training progress at the beginning. Affects reporting and training schedule.

To:

 resume_kimg             = NUMBER_OF_IMAGES_SEEN,      # Assumed training progress at the beginning. Affects reporting and training schedule.

Generally, you need to update this to the number of images seen by the network. If you are using the pre-trained official networks, this can be set to 15000.0 as a safe bet. This line alters the scheduling of the training, and failure to update this can sometimes lead to some funky results.
Finally, you may need to change line 129 from:

total_kimg              = 15000,    # Total length of the training, measured in thousands of real images.

To:

total_kimg              = NUMBER_GREATER_THAN_IMAGES_SEEN,    # Total length of the training, measured in thousands of real images.

Otherwise, training will stop when resume_kimg >= total_kimg.

And that is it! Now you have set up the scripts to begin training. All you need to do is begin training. Note that epochs can take an hour or more, even on GPU.

Crucially, stylegan will save these modifications into your results folder under the /src/ subfolder. This means you will not need to make these changes every time you want to resume training. You will simply need to make the changes to training_loop.py. And then point your run command to the train.py found in this subfolder:

! python /content/drive/MyDrive/results/00023-sgan-/content/drive/MyDrive/data/custom-dataset2-1gpu/src/train.py

Begin training

Now all you need to do is call the train.py script!

In [ ]:
! python /content/stylegan/train.py