Graph Regularized Auto-Encoders for Image Representation

Image representation has been intensively explored in the domain of computer vision for its significant influence on the relative tasks such as image clustering and classification. It is valuable to learn a low-dimensional representation of an image which preserves its inherent information from the original image space. At the perspective of manifold learning, this is implemented with the local invariant idea to capture the intrinsic low-dimensional manifold embedded in the high-dimensional input space. Inspired by the recent successes of deep architectures, we propose a local invariant deep nonlinear mapping algorithm, called graph regularized auto-encoder (GAE). With the graph regularization, the proposed method preserves the local connectivity from the original image space to the representation space, while the stacked auto-encoders provide explicit encoding model for fast inference and powerful expressive capacity for complex modeling. Theoretical analysis shows that the graph regularizer penalizes the weighted Frobenius norm of the Jacobian matrix of the encoder mapping, where the weight matrix captures the local property in the input space. Furthermore, the underlying effects on the hidden representation space are revealed, providing insightful explanation to the advantage of the proposed method. Finally, the experimental results on both clustering and classification tasks demonstrate the effectiveness of our GAE as well as the correctness of the proposed theoretical analysis, and it also suggests that GAE is a superior solution to the current deep representation learning techniques comparing with variant auto-encoders and existing local invariant methods.

[1]  A. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, CVPR 2004.

[2]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[4]  Qiang Chen,et al.  Contextualizing Object Detection and Classification , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Yann LeCun,et al.  Saturating Auto-Encoders , 2013, ICLR 2013.

[7]  Yoshua Bengio,et al.  Classification using discriminative restricted Boltzmann machines , 2008, ICML '08.

[8]  Hossein Mobahi,et al.  Deep Learning via Semi-supervised Embedding , 2012, Neural Networks: Tricks of the Trade.

[9]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[10]  Xuelong Li,et al.  Selecting Key Poses on Manifold for Pairwise Action Recognition , 2012, IEEE Transactions on Industrial Informatics.

[11]  Ahmed M. Elgammal,et al.  Inferring 3D body pose from silhouettes using activity manifold learning , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[12]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[13]  Yong Liu,et al.  Image Representation Learning Using Graph Regularized Auto-Encoders , 2013, ICLR.

[14]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Mikhail Belkin,et al.  Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering , 2001, NIPS.

[16]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[17]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Marc'Aurelio Ranzato,et al.  Unsupervised Learning of Invariant Feature Hierarchies with Applications to Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Fuzhen Zhuang,et al.  Embedding with Autoencoder Regularization , 2013, ECML/PKDD.

[20]  Thomas Hofmann,et al.  Greedy Layer-Wise Training of Deep Networks , 2007 .

[21]  Pascal Vincent,et al.  Higher Order Contractive Auto-Encoder , 2011, ECML/PKDD.

[22]  Gang Wang,et al.  Video tracking using learned hierarchical features. , 2015, IEEE transactions on image processing : a publication of the IEEE Signal Processing Society.

[23]  Meng Wang,et al.  Multimodal Deep Autoencoder for Human Pose Recovery , 2015, IEEE Transactions on Image Processing.

[24]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[25]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[26]  Pascal Vincent,et al.  A Connection Between Score Matching and Denoising Autoencoders , 2011, Neural Computation.

[27]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[28]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[29]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[30]  Quoc V. Le,et al.  Measuring Invariances in Deep Networks , 2009, NIPS.

[31]  Xiaojun Wu,et al.  Graph Regularized Nonnegative Matrix Factorization for Data Representation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  James F. Peters,et al.  Multi-manifold LLE learning in pattern recognition , 2015, Pattern Recognit..

[33]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[34]  Honglak Lee,et al.  Sparse deep belief net model for visual area V2 , 2007, NIPS.

[35]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[36]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[37]  Lei Zhang,et al.  Bit-Scalable Deep Hashing With Regularized Similarity Learning for Image Retrieval and Person Re-Identification , 2015, IEEE Transactions on Image Processing.

[38]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[39]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[40]  Kian-Ming Lim,et al.  Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle , 2015, Pattern Recognit..

[41]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[42]  Lin Sun,et al.  Laplacian Auto-Encoders: An explicit learning of nonlinear data manifold , 2015, Neurocomputing.

[43]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[44]  Liang Lin,et al.  Deep feature learning with relative distance comparison for person re-identification , 2015, Pattern Recognit..

[45]  Geoffrey E. Hinton,et al.  Autoencoders, Minimum Description Length and Helmholtz Free Energy , 1993, NIPS.

[46]  Yoshua Bengio,et al.  What regularized auto-encoders learn from the data-generating distribution , 2012, J. Mach. Learn. Res..

[47]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[48]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.