Monday, August 10, 2015

very basic intro to Caffe

This tutorial just gives you a basic idea of what Caffe can do (not for advanced use of Caffe through defining your own layers, using C++/Python API) for defining your own network to solve a Computer Vision task.

Caffe is a framework that can build different architectures of neural networks where you can train, test and deploy your models. It can serve as testing ground for deep neural nets (if you want try different combinations of existing neural net models).

I'm assuming you already have a background in Machine Learning, Statistical Pattern Recognition and have built Neural Networks in the past using some programming language.

Caffe lets you create, test neural nets of different styles using prototxt files. You write text files specifying the architecture of a neural net, how fast a net should solve, where should it pick the data from, what should it generate as output, etc.

To use Caffe (not Caffe API) you need to create data file, network definition and solver parameter file.

Here is how you create a dataset for Caffe from (there are multiple ways of doing this). If you want a black box technique, use this technique.

This was done in Ubuntu 14.04 with a NVIDIA GPU.

Things to do:
  1. Put all your train images in a folder. It contains images from all classes.
  2. Put all your test images in a folder. It should contain images from all classes.
  3. Create train.txt and test.txt with files within respective folders. These two files should contain (per line) "name_of_the_file.jpg <space> label_or_class_id" 

You then pass this data to caffe's convert_imageset binary.

Now, you define your network in layers using protobuf format. More on that is available here. You will need to define the network for training, testing, deployment (change input layer and output layer). These three need to have their own network definition defined in 3 separate files.

In the train, test network file, you will mention where the source is (the train, test lmdb image files you created, also image mean files). You can also scale images for better performance (make the range of images from 0-255). Using image mean files drastically improves the performances compared to not using them.

Now, you define your network in solver. It contains parameters which tell caffe to use what kind of gradient descent, how many iterations, learning rate, etc. How frequently should the progress of the training/testing be outputted to screen (to see if network configuration is performing as expected). In the solver file (if you want to) you also mention the specific network file you want to use.

Once all the three steps are completed, you can see the performance of the network as per the configuration defined in step 3.

In the end, you will be given a .caffemodel file. You can use this file, deployment network file (defined in prototxt), load your own data using a simple C++/Python program to use. For the sample python interface, you can look at sampel ipynb file provided in Caffe examples folder.

If you want to visualize your drawn graph, you can use the python script inside <Caffe_dir>/

You can do it as follows:
P.S. If this seems too complex for you, you could install NVIDIA DIGITS software. It uses a GUI through a web-browser interface (and lets you do the above tasks). It lets you download the trained models in the end. You can set parameters for training, solver files, scale data while giving it to input, etc. You can also finetune models, customize popularly available models (like LeNet, AlexNet, GoogLENet

Saturday, August 1, 2015

Notes from NVIDIA's Intro to Deep Learning (Q&A session)

These are the few notes I took from NVIDIA's intro deep learning class. Its for folks trying to learn deep learning and apply deep learning techniques into their research, products (companies). To understand, you will need to have basic knowledge of pattern recognition, machine learning in general. I would recommend you to go through pre-reqs and basic of deep learning.

How to determine optimal network structure?
Answer: Determining the number of layers in a deep network is still a research topic. Its more of an art than a science. Choose network architecture which is similar to other networks which were trained on similar data. For example, choose LeNet style architecture for digit or character recognition. Looks for exiting publications, examples.

If your network isn't over-fitting, it isn't large enough. Keep increasing number of layers, parameters until we get really good training accuracy, then go ahead and try testing.

How to understand multiple layers of a deep network?
Answer: One of the criticisms of deep learning is, its seen as a black-box technology (not many people understanding whats going on underneath). If we are talking about convolutional neural nets, one way to visualize what these are doing is to do deconvolution. One can also visualize filters of vanilla neural nets. NVIDIA's digits software can help you visualize the inner layers of the deep network. You can also do this manually in CAFFE, Theano, etc (other deep ml frameworks). The idea is lower layers learn edges, higher layers learn combination of edges, so on.

Look for deconvolution paper (to try to understand which input maximally activates a neuron) to understand what the network is learning.

What are usual training, testing duration for a network? What about in deployment phase?
Answer: It varies. It depends on the size of dataset, RAM, processing power of the PC, presence of GPU, memory, etc.

For a large dataset:
Training takes hours/days/weeks (we need to do feed forward, back-propagation for every image sample, update weights throughout).
Testing is feed forward (usually fast).
Deployment is also feed forward. Usually takes very less time. Theano, Caffe can be set to use CPU or GPU for feed forward (not just during training phase).
I've used this personally on a convolutional neural network (through CAFFE), one sample image (256 x 256) took 0.16 seconds for feed forwarding on CPU and it took 0.016 seconds on GPU. I used a NVIDIA Quadro 2100M.

Are there any limitations on dimensions of an input in a deep network?
Answer: For vanilla neural networks, convolutional nets, usually the input size is fixed.
It is done to support batches of training. we can vectorize network operations and training quickly. To fix it, we crop them, resize them, etc

What is Fine Tuning?
Answer: Lets say you have some data with you, you train a network. Later you come across another class of data to be added to the classification problem. You can initialize your new network with previous weights (on lower layers) and add new structure to upper layers and fine-tune them (perform feed forward, back-propagation from classification back to input). It is an effective and fast way to include new classes of data. The basic idea is lower layers of the network learn edges, but higher layers learn other combinations of these edges. So, higher layers change whereas usually, the lower layers usually have similar weights.

What is transfer learning?
Answer: Try to use the network architecture and weights on a new data (similar source of data). Then fine tune the network. We can train a network to detect images of animals, we can use same weights to initialize to train similar network architecture on a classifier to detect humans.

What are some of the techniques followed to pre-process the data?
Answer: Subtract the mean across each training image. Standardize the values (like values ranges to 0 - 1). PCA, LDA can also be applied. Whitening is also applied. Subtracting mean of the data from the data, normalizing the range of values goes a long way in improving the performance of the classification task (generally speaking).

Will weights fluctuate or converge smoothly when more data is added to the training set?
Answer: Not always. It usually converges smoothly. But, there are cases when it may fluctuate. It depends on the data that you are adding. Sometimes, it may stop changing (vanishing gradient) as the gradient transferred is very very small. We will have to fine tune learning rate, how we initialize weights. Don't ever initialize weights to zero and symmetric (all zeros, ones or any number). Initialize to non-zero, positive, less than 1, random numbers.

How to use multiple GPUs for training neural nets?
It is still a research topic. There are several ways:
1) Use same network architecture for training a network, split training data between GPUs, train different nets, exchange parameters across GPUs so that end network parameters may become a combination of the different nets training on the multiple GPUs.
2) Model parallelism: Break same network across GPUs, train layer by layer separately on each GPU.

How many epochs/iterations you train the network for? What are batches?
Answer: Epoch/Iteration are used interchangeably. How many times each sample training is fed to network once. Typically used number is 1000.
Batch: number of images used in one pass.

How to determine batch size?
Answer: It depends on size of individual training examples, number of parameters in the network. Experiment with different batch size based on size of ram, GPU memory. In CAFFE, before starting the training process, if the batch size is large, your computer doesn't have enough memory, the process will fail to start. Try small batch sizes 8, 16, 64, 100, 256, ....

How to avoiding overfitting?
Answer: We usually monitor performance of model against training set, validation set. After some time if training data accuracy keeps on increasing but not on the validation set, we are overfitting the network. We need to stop the training process and save the values of weights. Other techniques are also available (dropout?). Get more data.

What to use for time series data, Convolutional Nets or Recurrent Neural Nets?
Answer: Usually for time series data, we use recurrent neural net (can be used on temporal data). If our time series data is one dimensional, convolutional net isn't usually chosen unless. Convolutional net can be applied when there is a spatial component. If time series data has temporal and spatial component, we can consider the data as an image (like audio data represented in different bands with different colors) and use convolutional nets instead of a recurrent net.

All modern deep learning frameworks make use of GPU's. You don't necessarily have to to learn GPU programming unless you are planning to customize those frameworks (write your own activation function, customized layer, vectorize the layer through GPU, etc). Usually you could make use of GPU with flick of a switch (turn on use GPU flag). In theano, you change the config file settings, in Caffe you can set GPU or CPU through solver file or use API (PyCaffe).

From shallow to deep networks (picked it from Yann LeCun's ICML 2013 presentation).

Winners of NASA Centennial Challenge: Sample Return Robot (2015)

I had a fun time at WPI (Massachusetts). I was a part of the team (WVU Mountaineers) competing at the NASA Sample Robot Return Challenge (2015). The task was to design an Autonomous Robot detect and pick up objects (of interest) in a large 20 acre field (navigating without GPS, WiFi, etc).

Here is a video about it:

Cataglyphis (our robot)

I was on the computer vision team ( 2 person team). I designed the machine learning and object recognition module.

The WVU Mountaineers

We used ROS, OpenCV, Sklearn for the computer vision module.

We won and stood first in the competition. We won $100,000 for our efforts.

The WVU Mountaineers
You can check out more details here.