Sunday, March 13, 2016

Simple Autoencoder on MNIST dataset

So, I had fun with Theano and trained an Autoencoder on a MNIST dataset.

Autoencoder is a simple Neural network (with one hidden layer) which reproduces the input passed to it. By controlling the number of hidden neurons, we can learn interesting features from the input and data can be compressed as well (sounds like PCA). Autoencoders can be used for unsupervised feature learning. Data trasnformed using autoencoders can be used for supervised classification of datasets.

More about Autoencoders is available here. More variants of Autoencoders exist (Sparse, Contractive, etc.) are available with different constraints on the hidden layer representation.

I trained an vanilla Autoencoder for 100 epochs with 16 mini batch size and learning rate of 0.01

Here are the figures for digit 7 with hidden size 10 and 20 (original data was MNIST training dataset with  784 unit length feature vector). Each of the digits (whose value was 8) were passed to the autoencoder. Each of the hidden units were visualized (after computing the mean).

Wednesday, March 9, 2016

simple Convolutional Neural Net based object recognition

I made a simple object recognition module over the last weekend. I wrote a Theano based convolutional neural network. I wrote a simple OpenCV based image segmentation program.

The output from the image segmentation program is passed to the Theano based Convolutional Neural Network.

I used 10,000 images for training and 16,000 images for testing.

The black circles in the video indicate the regions where the classifier is looking and red circles indicate true positives found by the algorithm.

Monday, August 10, 2015

very basic intro to Caffe

This tutorial just gives you a basic idea of what Caffe can do (not for advanced use of Caffe through defining your own layers, using C++/Python API) for defining your own network to solve a Computer Vision task.

Caffe is a framework that can build different architectures of neural networks where you can train, test and deploy your models. It can serve as testing ground for deep neural nets (if you want try different combinations of existing neural net models).

I'm assuming you already have a background in Machine Learning, Statistical Pattern Recognition and have built Neural Networks in the past using some programming language.

Caffe lets you create, test neural nets of different styles using prototxt files. You write text files specifying the architecture of a neural net, how fast a net should solve, where should it pick the data from, what should it generate as output, etc.

To use Caffe (not Caffe API) you need to create data file, network definition and solver parameter file.

Here is how you create a dataset for Caffe from (there are multiple ways of doing this). If you want a black box technique, use this technique.

This was done in Ubuntu 14.04 with a NVIDIA GPU.

Things to do:
  1. Put all your train images in a folder. It contains images from all classes.
  2. Put all your test images in a folder. It should contain images from all classes.
  3. Create train.txt and test.txt with files within respective folders. These two files should contain (per line) "name_of_the_file.jpg <space> label_or_class_id" 

You then pass this data to caffe's convert_imageset binary.

Now, you define your network in layers using protobuf format. More on that is available here. You will need to define the network for training, testing, deployment (change input layer and output layer). These three need to have their own network definition defined in 3 separate files.

In the train, test network file, you will mention where the source is (the train, test lmdb image files you created, also image mean files). You can also scale images for better performance (make the range of images from 0-255). Using image mean files drastically improves the performances compared to not using them.

Now, you define your network in solver. It contains parameters which tell caffe to use what kind of gradient descent, how many iterations, learning rate, etc. How frequently should the progress of the training/testing be outputted to screen (to see if network configuration is performing as expected). In the solver file (if you want to) you also mention the specific network file you want to use.

Once all the three steps are completed, you can see the performance of the network as per the configuration defined in step 3.

In the end, you will be given a .caffemodel file. You can use this file, deployment network file (defined in prototxt), load your own data using a simple C++/Python program to use. For the sample python interface, you can look at sampel ipynb file provided in Caffe examples folder.

If you want to visualize your drawn graph, you can use the python script inside <Caffe_dir>/

You can do it as follows:
P.S. If this seems too complex for you, you could install NVIDIA DIGITS software. It uses a GUI through a web-browser interface (and lets you do the above tasks). It lets you download the trained models in the end. You can set parameters for training, solver files, scale data while giving it to input, etc. You can also finetune models, customize popularly available models (like LeNet, AlexNet, GoogLENet