The phenomenal growth of Image/Video on the web and the increasing sparseness of meta information to go along with forces us to look for signals from the Image/Video content for Search / Information Retrieval and Browsing based corpus exploration. One of the prominent type of information that users look for while searching/browsing through such corpora is information around the people present in the Image/Video. While face recognition has matured to some extent over the past few years, this problem remains a hard one due to a) absence of labelled data for such a large set of celebrities that users look for and b) the variability of age/makeup/expressions/pose in the target corpus. We propose a learning paradigm which we refer to as consistency learning to address both these issues by posing the problem of learning from weakly labelled training set. We use the text-image co-occurrence on the web as a weak signal of relevance and learn the set of consistent face models from this very large and noisy training set. The resulting system learns face models for a large set of celebrities directly from the web and uses it to tag Image/Video for better retrieval. While the proposed method has been applied to faces, we see it broadly applicable in any learning problem with a suitable similarity metric defined. We present results on learning from a very large dataset of 37 million images resulting in a validation accuracy of 92.68%.
[1]
David A. Forsyth,et al.
Matching Words and Pictures
,
2003,
J. Mach. Learn. Res..
[2]
Azriel Rosenfeld,et al.
Face recognition: A literature survey
,
2003,
CSUR.
[3]
Donald Geman,et al.
Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images
,
1984,
IEEE Transactions on Pattern Analysis and Machine Intelligence.
[4]
T. K. Leungfj,et al.
Finding Faces in Cluttered Scenes using Random Labeled Graph Matching
,
1995
.
[5]
Andrew W. Fitzgibbon,et al.
On Affine Invariant Clustering and Automatic Cast Listing in Movies
,
2002,
ECCV.
[6]
Pietro Perona,et al.
Learning object categories from Google's image search
,
2005,
Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.
[7]
Yee Whye Teh,et al.
Names and faces in the news
,
2004,
CVPR 2004.
[8]
David A. Forsyth,et al.
Words and Pictures in the News
,
2003,
HLT-NAACL 2003.
[9]
Pietro Perona,et al.
A Visual Category Filter for Google Images
,
2004,
ECCV.
[10]
Michael C. Burl,et al.
Finding faces in cluttered scenes using random labeled graph matching
,
1995,
Proceedings of IEEE International Conference on Computer Vision.
[11]
Donald Geman,et al.
Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images
,
1984
.