Using Guided Autoencoders on Face Recognition

In this master thesis we will create guided autoencoders (GAE) and apply them to face recognition. GAEs are agents that interact with images, using a novel combination of autoencoders and reinforcement learning. They perceive part of an image through a window, use an autoencoder to encode it, and react to what they see by moving the window. GAEs are trained to find and encode specific parts of the face – in our case the eyes, nose and mouth. We use the LFWC (cropped Labeled Faces in the Wild) dataset which is very varied and has many uncontrolled variables. We train GAEs using the CACLA reinforcement learning algorithm which can deal with continuous states and actions. To create a state, GAEs evaluate their separately trained autoencoder on what is visible through their window. The resulting state guides their actions. We show that GAEs are able to navigate the complex landscapes of the face images using only local information, which is quite remarkable. The experiments show that GAEs can find their goals if they are initialized relatively close to their goal. If we add position information to the encodings, the performance increases greatly. We also compare deep stacked autoencoders and shallow autoencoders. Surprisingly, deep GAEs do not outperform shallow GAEs on this task. The GAEs are finally used to classify the gender of faces and whether a person is smiling or not. They are able to do classification, but do not rival state-of-the-art systems. Their flexibility however allows them to be extended easily to improve performance. In summary, the GAEs are currently not able to perform better on classification than the state of the art. However, their ability to navigate complex images and their flexibility makes them promising tools for face recognition and computer vision.

[1]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[2]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[3]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[4]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[5]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[7]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[8]  A. A. El-Harby,et al.  Face Recognition: A Literature Review , 2008 .

[9]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[10]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[11]  Pawan Sinha,et al.  Face Recognition by Humans: Nineteen Results All Computer Vision Researchers Should Know About , 2006, Proceedings of the IEEE.

[12]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[13]  M.A. Wiering,et al.  Reinforcement Learning in Continuous Action Spaces , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[14]  Thomas Serre,et al.  A quantitative theory of immediate visual recognition. , 2007, Progress in brain research.

[15]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[16]  Margot J. Taylor,et al.  Spatio temporal dynamics of face recognition. , 2008, Cerebral cortex.

[17]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[18]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[19]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Antonio Albiol,et al.  Precise eye localization using HOG descriptors , 2011, Machine Vision and Applications.

[21]  Geoffrey E. Hinton,et al.  Learning to Represent Spatial Transformations with Factored Higher-Order Boltzmann Machines , 2010, Neural Computation.

[22]  Oscar Déniz-Suárez,et al.  A comparison of face and facial feature detectors based on the Viola–Jones general object detection framework , 2011, Machine Vision and Applications.

[23]  Jonathan D. Linton,et al.  Augmented Efficient BackProp for backpropagation learning in deep autoassociative neural networks , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[24]  Luca Maria Gambardella,et al.  High-Performance Neural Networks for Visual Object Classification , 2011, ArXiv.

[25]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[26]  Grgoire Montavon,et al.  Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.