论文信息 - Learning people annotation from the web via consistency learning

Learning people annotation from the web via consistency learning

The phenomenal growth of Image/Video on the web and the increasing sparseness of meta information to go along with forces us to look for signals from the Image/Video content for Search / Information Retrieval and Browsing based corpus exploration. One of the prominent type of information that users look for while searching/browsing through such corpora is information around the people present in the Image/Video. While face recognition has matured to some extent over the past few years, this problem remains a hard one due to a) absence of labelled data for such a large set of celebrities that users look for and b) the variability of age/makeup/expressions/pose in the target corpus. We propose a learning paradigm which we refer to as consistency learning to address both these issues by posing the problem of learning from weakly labelled training set. We use the text-image co-occurrence on the web as a weak signal of relevance and learn the set of consistent face models from this very large and noisy training set. The resulting system learns face models for a large set of celebrities directly from the web and uses it to tag Image/Video for better retrieval. While the proposed method has been applied to faces, we see it broadly applicable in any learning problem with a suitable similarity metric defined. We present results on learning from a very large dataset of 37 million images resulting in a validation accuracy of 92.68%.

Jay Yagnik | Atiq Islam

[1] David A. Forsyth,et al. Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[2] Azriel Rosenfeld,et al. Face recognition: A literature survey , 2003, CSUR.

[3] Donald Geman,et al. Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] T. K. Leungfj,et al. Finding Faces in Cluttered Scenes using Random Labeled Graph Matching , 1995 .

[5] Andrew W. Fitzgibbon,et al. On Affine Invariant Clustering and Automatic Cast Listing in Movies , 2002, ECCV.

[6] Pietro Perona,et al. Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[7] Yee Whye Teh,et al. Names and faces in the news , 2004, CVPR 2004.

[8] David A. Forsyth,et al. Words and Pictures in the News , 2003, HLT-NAACL 2003.

[9] Pietro Perona,et al. A Visual Category Filter for Google Images , 2004, ECCV.

[10] Michael C. Burl,et al. Finding faces in cluttered scenes using random labeled graph matching , 1995, Proceedings of IEEE International Conference on Computer Vision.

[11] Donald Geman,et al. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images , 1984 .