Objective
Transfer Learning technique is used when the dataset is not of sufficient size. It is common to fine tune a network which is pre-trained on large datasets like Imagenet for classification tasks. For further information, the reader is advised to refer to CS231N by Standford.A framework for general purpose transfer learning is proposed. This framework is developed for my MSc. thesis study and made publicly available to let researchers make use of it.
Using this framework, the researcher will easily be able to fine-tune a network for a classification task.
Audience
This article can be useful for anyone seeking information about transfer learning implementation.Python knowledge is required to make use of the supplied code. Introductory information about Keras, a deep learning API, is necessary.
Also in order to run the code, a proper deep learning system with a decent graphics card (GPU) with CUDA Compute Capability 3.0 or higher is necessary is necessary.
Information on how to create a deep learning system can be found in one of the following resources:
- Install Tensorflow for Windows
- Install Tensorflow for Linux
- Prepare an Ubuntu System for Deep Learning
Why Keras?
Keras is a deep learning API which can use be used with Tensorflow, Theano or CNTK. Keras introduces a simple and intuitive API. It is easy to find resources about Keras. Keras comes with network weights for popular convolutional neural networks.Why a Transfer Learning Framework?
Keras already provides a simple and intuitive interface for transfer learning. For more information, Keras Applications page worths visiting.However, doing a research requires more than what Keras provides. The proposed Transfer Learning Framework aims to eliminate boilerplate code for researchers. To give an idea, researchers need to run many tests with different parameters. Often it becomes hard to keep track of experiment results and configurations. Keras does not come with tools to visualize experiment results.
The Transfer Learning Framework tries to provide configuration options to execute different experiments without requiring code changes.
User Guide
A. Fine Tune A Model
Let's look at how a model is fine-tuned. For a demo, VGGFace model is fine-tuned for emotion recognition using FER-13 dataset.Input parameters are the location where the dataset is, weights file to finetune and working directory to put outputs.
Dimension is determined by the network to fine-tune. For VGGFace model it is 224X224. Batch size is determined by image dimensions and GPU memory available. The larger the batch size the faster the learning process will be. Learning rate is better left at 0,001. The optimizer can be left as stochastic gradient descent. The number of layers to train during training is also important.
All configurations which are likely to affect model performance are recorded in a parameter.txt file.
All outputs can be found in an experiment directory which is created under the working directory. The experiment directory is named using hash value of the parameters.txt file and timestamp. This way it is easy to track multiple experiments with the same configuration.
The most important output file is the weights file which can be found under checkpoints directory. The weights files also contain information about network so they are enough to use the trained network later.
Validation results are kept in a results.txt file. Results file starts with the confusion matrix. Results row, results file contains loss, accuracy, precision and recall values. At the last raw, the number of validation samples used is printed.
Logs directıry keeps the log of validation and training accuracy/loss values after each epoch. Before fine-tuning a model, a short fine-tuning step is run of which the length is specified by a top_epochs parameter. This is why you will find two logs files.
Confusion matrices are created in the experiment directory. loss and accuracy curves are created in the same experiment directory. Accuracy and Loss curves are divided into two sections by a verticle red line. That line shows the boundary between short and actual fine-tuning procedures.
As an example, GitHub repository can be examined.
https://github.com/habanoz/deep-emotion-recognition/tree/master/thesis/models/vggface/1-1512579280.12-17403535-best
Training/Validation Accuracy Curve |
Normalized Confusion Matrix |
B. Fine Tune Using Weights Files
Suppose you want to fine-tune again a model you created with another dataset. It is as easy as pointing the location weights file in the WEIGHTS_FILE variable. Also, point to the new dataset in DATA_DIR variable. That's all.C. Training For Other Problems
Extend the class KerasTrainCnnBase class as in the case of EmotTrainCnn class for a new problem. The only major difference will probably be the number of classes.D. Changing the Underlying Network
Currently, InceptionV3, ResNet50, VGG16 and VGGFace networks supported. Model weights are for Imagenet dataset except for the VGGFace mode which is trained using Oxford Face dataset. InceptionV3, ResNet50 and VGG16 weights are provided by Keras. VGGFace weights are provided by https://github.com/rcmalli/keras-vggface.
E. Processing Input Images
Input images are not always ready for training and may require some processing. FER-13 dataset is ready for training. But others may not be ready. extract and util.dataset_scripts packages provide utilities for image processing needs. In order to utility classes on datasets, it is required to have a data_file.csv file at the dataset root directory. Training and Validation images should be put inside train and val directories respectively.
util.dataset_scripts package contains structure scripts for creating data_file.csv for a dataset. Note that the structuring process is special to each dataset. If there is not a structuring script for your dataset in the package you will need to add it. Present structuring scripts can be used as examples.
E.1. Face Alignment
In all the images, the position of the eyes and nose should be at the same location. Without alignment, training may require more data.
Datasets may require face alignment before training. If multiple datasets will be used, they should be sharing the same face alignment.
extract.extract_faces.py contains ExtractFaces class which can be used to extract and align faces from source images. For face extraction and alignment MtCnnDetector from https://github.com/pangyupo/mxnet_mtcnn_face_detection.
E.2. Pre-Processing
It is important to prepare source images for training. extact.preprocess.PreporcessFaces can be used to pre-process aligned face images. It applies grayscale conversion, histogram equalization and resizes using cubic interpolation.
Comments
Post a Comment