Learning people annotation from the web via consistency learning

The phenomenal growth of Image/Video on the web and the increasing sparseness of meta information to go along with forces us to look for signals from the Image/Video content for Search / Information Retrieval and Browsing based corpus exploration. One of the prominent type of information that users look for while searching/browsing through such corpora is information around the people present in the Image/Video. While face recognition has matured to some extent over the past few years, this problem remains a hard one due to a) absence of labelled data for such a large set of celebrities that users look for and b) the variability of age/makeup/expressions/pose in the target corpus. We propose a learning paradigm which we refer to as consistency learning to address both these issues by posing the problem of learning from weakly labelled training set. We use the text-image co-occurrence on the web as a weak signal of relevance and learn the set of consistent face models from this very large and noisy training set. The resulting system learns face models for a large set of celebrities directly from the web and uses it to tag Image/Video for better retrieval. While the proposed method has been applied to faces, we see it broadly applicable in any learning problem with a suitable similarity metric defined. We present results on learning from a very large dataset of 37 million images resulting in a validation accuracy of 92.68%.

[1]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[2]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[3]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  T. K. Leungfj,et al.  Finding Faces in Cluttered Scenes using Random Labeled Graph Matching , 1995 .

[5]  Andrew W. Fitzgibbon,et al.  On Affine Invariant Clustering and Automatic Cast Listing in Movies , 2002, ECCV.

[6]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7]  Yee Whye Teh,et al.  Names and faces in the news , 2004, CVPR 2004.

[8]  David A. Forsyth,et al.  Words and Pictures in the News , 2003, HLT-NAACL 2003.

[9]  Pietro Perona,et al.  A Visual Category Filter for Google Images , 2004, ECCV.

[10]  Michael C. Burl,et al.  Finding faces in cluttered scenes using random labeled graph matching , 1995, Proceedings of IEEE International Conference on Computer Vision.

[11]  Donald Geman,et al.  Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .