Unsupervised learning of clutter-resistant visual representations from natural videos

Populations of neurons in inferotemporal cortex (IT) maintain an explicit code for object identity that also tolerates transformations of object appearance e.g., position, scale, viewing angle (1, 2, 3). Though the learning rules are not known, recent results (4, 5, 6) suggest the operation of an unsupervised temporal-association-based method e.g., Foldiak's trace rule (7). Such methods exploit the temporal continuity of the visual world by assuming that visual experience over short timescales will tend to have invariant identity content. Thus, by associating representations of frames from nearby times, a representation that tolerates whatever transformations occurred in the video may be achieved. Many previous studies veried that such rules can work in simple situations without background clutter, but the presence of visual clutter has remained problematic for this approach. Here we show that temporal association based on large class-specic lters (templates) avoids the problem of clutter. Our system learns in an unsupervised way from natural videos gathered from the internet, and is able to perform a dicult unconstrained face recognition task on natural images (Labeled Faces in the Wild (8)).

[1]  Doris Y. Tsao,et al.  Functional Compartmentalization and Viewpoint Generalization Within the Macaque Face-Processing System , 2010, Science.

[2]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[3]  Joel Z. Leibo,et al.  Learning and disrupting invariance in visual recognition with a temporal association rule , 2011, Front. Comput. Neurosci..

[4]  Joel Z. Leibo,et al.  Learning invariant representations and applications to face verification , 2013, NIPS.

[5]  Michael W. Spratling Learning viewpoint invariant perceptual representations from cluttered images , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Joel Z. Leibo,et al.  Can a biologically-plausible hierarchy effectively replace face detection, alignment, and recognition pipelines? , 2013, ArXiv.

[7]  Peter Földiák,et al.  Learning Invariance from Transformation Sequences , 1991, Neural Comput..

[8]  Shiguang Shan,et al.  Fusing Robust Face Region Descriptors via Multiple Metric Learning for Face Recognition in the Wild , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Joel Z. Leibo,et al.  The invariance hypothesis and the ventral stream , 2014 .

[10]  T. Poggio,et al.  Hierarchical models of object recognition in cortex , 1999, Nature Neuroscience.

[11]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[12]  E. Oja Simplified neuron model as a principal component analyzer , 1982, Journal of mathematical biology.

[13]  Terrence J. Sejnowski,et al.  Slow Feature Analysis: Unsupervised Learning of Invariances , 2002, Neural Computation.

[14]  Joel Z. Leibo,et al.  Subtasks of Unconstrained Face Recognition , 2014, 2014 International Conference on Computer Vision Theory and Applications (VISAPP).

[15]  J. DiCarlo,et al.  Neuronal Learning of Invariant Object Representation in the Ventral Visual Stream Is Not Dependent on Reward , 2012, The Journal of Neuroscience.

[16]  Tal Hassner,et al.  Multiple One-Shots for Utilizing Class Label Information , 2009, BMVC.

[17]  Sharat Chikkerur,et al.  Approximations in the HMAX Model , 2011 .

[18]  D. Hubel,et al.  Receptive fields, binocular interaction and functional architecture in the cat's visual cortex , 1962, The Journal of physiology.

[19]  Eric T. Carlson,et al.  A neural code for three-dimensional object shape in macaque inferotemporal cortex , 2008, Nature Neuroscience.

[20]  Edmund T. Rolls,et al.  Invariant Visual Object and Face Recognition: Neural and Computational Bases, and a Model, VisNet , 2012, Front. Comput. Neurosci..

[21]  Frédéric Jurie,et al.  Face Recognition using Local Quantized Patterns , 2012, BMVC.

[22]  Brian C. Lovell,et al.  Multi-Region Probabilistic Histograms for Robust and Scalable Identity Inference , 2009, ICB.

[23]  J. DiCarlo,et al.  Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal Cortex , 2010, Neuron.

[24]  Lorenzo Rosasco,et al.  Unsupervised learning of invariant representations with low sample complexity: the magic of sensory cortex or a new framework for machine learning? , 2014 .

[25]  Frédéric Jurie,et al.  Learning Visual Similarity Measures for Comparing Never Seen Objects , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Xiang Zhang,et al.  OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[27]  S. Gerber,et al.  Unsupervised Natural Experience Rapidly Alters Invariant Object Representation in Visual Cortex , 2008 .

[28]  Josef Kittler,et al.  Efficient processing of MRFs for unconstrained-pose face recognition , 2013, 2013 IEEE Sixth International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[29]  Niko Wilbert,et al.  Invariant Object Recognition and Pose Estimation with Slow Feature Analysis , 2011, Neural Computation.

[30]  Stan Z. Li,et al.  Towards Pose Robust Face Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Thomas Serre,et al.  Robust Object Recognition with Cortex-Like Mechanisms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Gaurav Sharma,et al.  Local Higher-Order Statistics (LHS) for Texture Categorization and Facial Analysis , 2012, ECCV.

[33]  Tomaso Poggio,et al.  Fast Readout of Object Identity from Macaque Inferior Temporal Cortex , 2005, Science.