Wednesday, March 2, 2011

Implementing Face Tracer search engine: Part 1

Hi all. I'm trying to implement this paper of Face Tracer search engine from Columbia University. The concept looks very interesting, sometimes obvious but very Novel in terms of search engine implementation.

Presently, I've downloaded the database and looking ways to train SVM's for each of the attribute. The author has not provided the entire database on his website. He instead, asked us to write a program to download the urls mentioned in the text file. So, I've written a small python script, to parse the entire file and download the files one by one.

I've made use of the urllib module in python and used urlretreive method/function to download the file.

The database consists of 15,000 images, I tried to download all of them but many of them weren't there as the paper and the links were itself 3 years old. So in order to make things quicker, I divided the faceindex.txt file into 5 parts (faceindex1.txt, faceindex2.txt, faceindex3.txt, faceindex4.txt, faceindex5.txt), with 3000 images urls in each file. I tried to download it then, it became fast enough (downloaded it in 3 hours).

I also made use of PIL module to open an image and check if its valid, if an exception was raised it was considered to be a corrupt image and was logged into a file to make a list of files that were corrupt in the database.

I've arranged the dataset according to facelabels.txt file into different folders for convenience sake.

Now as I've got the images, I need to make use of Adaboost to select best classifiers, then train SVM to implement the search engine. Will keep you posted with the updates.

No comments:

Post a Comment