Non-linear Latent Factor Models for Revealing Structure in High-dimensional Data

Real world data is not random: The variability in the data-sets that arise in computer vision, signal processing and other areas is often highly constrained and governed by a number of degrees of freedom that is much smaller than the superficial dimensionality of the data. Unsupervised learning methods can be used to automatically discover the “true”, underlying structure in such data-sets and are therefore a central component in many systems that deal with high-dimensional data. In this thesis we develop several new approaches to modeling the low-dimensional structure in data. We introduce a new non-parametric framework for latent variable modelling, that in contrast to previous methods generalizes learned embeddings beyond the training data and its latent representatives. We show that the computational complexity for learning and applying the model is much smaller than that of existing methods, and we illustrate its applicability on several problems. We also show how we can introduce supervision signals into latent variable models using conditioning. Supervision signals make it possible to attach “meaning” to the axes of a latent representation and to untangle the factors that contribute to the variability in the data. We develop a model that uses conditional latent variables to extract rich distributed representations of image transformations, and we describe a new model for learning transformation features in structured supervised learning problems.

[1]  Geoffrey E. Hinton,et al.  Visualizing Similarity Data with a Mixture of Maps , 2007, AISTATS.

[2]  Helge J. Ritter,et al.  Principal surfaces from unsupervised kernel regression , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[4]  Tomaso A. Poggio,et al.  Linear Object Classes and Image Synthesis From a Single Example Image , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Joshua B. Tenenbaum,et al.  Separating Style and Content with Bilinear Models , 2000, Neural Computation.

[6]  Peter V. Gehler,et al.  The rate adapting poisson model for information retrieval and object recognition , 2006, ICML.

[7]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[8]  B. Olshausen Neural routing circuits for forming invariant representations of visual objects , 1994 .

[9]  Joaquin Quiñonero Candela,et al.  Local distance preservation in the GP-LVM through back constraints , 2006, ICML.

[10]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[11]  Geoffrey E. Hinton,et al.  Exponential Family Harmoniums with an Application to Information Retrieval , 2004, NIPS.

[12]  David J. Fleet,et al.  Gaussian Process Dynamical Models , 2005, NIPS.

[13]  David Salesin,et al.  Image Analogies , 2001, SIGGRAPH.

[14]  Geoffrey E. Hinton,et al.  Stochastic Neighbor Embedding , 2002, NIPS.

[15]  David Cohn,et al.  Informed Projections , 2002, NIPS.

[16]  R. Memisevic Dual Optimization of Conditional Probability Models December 21 , 2006 , .

[17]  Simon Haykin,et al.  GradientBased Learning Applied to Document Recognition , 2001 .

[18]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[19]  Ivor W. Tsang,et al.  The pre-image problem in kernel methods , 2003, IEEE Transactions on Neural Networks.

[20]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[21]  Geoffrey E. Hinton A Parallel Computation that Assigns Canonical Object-Based Frames of Reference , 1981, IJCAI.

[22]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[23]  W. Reichardt Movement perception in insects , 1969 .

[24]  Kurt Hornik,et al.  Learning in linear neural networks: a survey , 1995, IEEE Trans. Neural Networks.

[25]  Brendan J. Frey,et al.  Topographic Transformation as a Discrete Latent Variable , 1999, NIPS.

[26]  John W. Sammon,et al.  A Nonlinear Mapping for Data Structure Analysis , 1969, IEEE Transactions on Computers.

[27]  R. Zemel,et al.  Multiscale conditional random fields for image labeling , 2004, CVPR 2004.

[28]  Geoffrey E. Hinton,et al.  Improving dimensionality reduction with spectral gradient descent , 2005, Neural Networks.

[29]  Yee Whye Teh,et al.  Semiparametric latent factor models , 2005, AISTATS.

[30]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[31]  Nello Cristianini,et al.  Efficiently Learning the Metric with Side-Information , 2003, ALT.

[32]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[33]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[34]  Pedro Larrañaga,et al.  An Introduction to Probabilistic Graphical Models , 2002, Estimation of Distribution Algorithms.

[35]  Aaron Hertzmann,et al.  Style-based inverse kinematics , 2004, SIGGRAPH 2004.

[36]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[37]  Charles R. Johnson,et al.  Topics in Matrix Analysis , 1991 .

[38]  I. Ohzawa,et al.  Neural mechanisms for processing binocular information II. Complex cells. , 1999, Journal of neurophysiology.

[39]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[40]  Ahmed M. Elgammal,et al.  Separating style and content on a nonlinear manifold , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[41]  Aaron Hertzmann,et al.  Style-based inverse kinematics , 2004, ACM Trans. Graph..

[42]  Douglas Hofstadter,et al.  The Copycat Project: An Experiment in Nondeterminism and Creative Analogies , 1984 .

[43]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[44]  Michael E. Tipping,et al.  Probabilistic Principal Component Analysis , 1999 .

[45]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[46]  Neil D. Lawrence,et al.  MOCAP Toolbox for MATLAB , 2005 .

[47]  Michael Schmitt,et al.  On the Complexity of Computing and Learning with Multiplicative Neural Networks , 2002, Neural Computation.

[48]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[49]  Colin Giles,et al.  Learning, invariance, and generalization in high-order neural networks. , 1987, Applied optics.

[50]  S. Palmer Vision Science : Photons to Phenomenology , 1999 .

[51]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[52]  Rajesh P. N. Rao,et al.  Efficient Encoding of Natural Time Varying Images Produces Oriented Space-Time Receptive Fields , 1997 .

[53]  Nicolai Petkov,et al.  Contour detection based on nonclassical receptive field inhibition , 2003, IEEE Trans. Image Process..

[54]  David J. Fleet,et al.  Design and Use of Linear Models for Image Motion Analysis , 2000, International Journal of Computer Vision.

[55]  Geoffrey E. Hinton,et al.  Unsupervised Learning of Image Transformations , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Aapo Hyvärinen,et al.  Survey on Independent Component Analysis , 1999 .

[57]  Yoshua Bengio,et al.  An Input Output HMM Architecture , 1994, NIPS.

[58]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[59]  D. B. Gerham Characterizing virtual eigensignatures for general purpose face recognition , 1998 .

[60]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[61]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[62]  Michael J. Black,et al.  Fields of Experts: a framework for learning image priors , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[63]  Paul A. Viola,et al.  Empirical Entropy Manipulation for Real-World Problems , 1995, NIPS.

[64]  Miguel Á. Carreira-Perpiñán,et al.  Multiscale conditional random fields for image labeling , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[65]  Terrence J. Sejnowski,et al.  Edges are the Independent Components of Natural Scenes , 1996, NIPS.

[66]  Nicolas Le Roux,et al.  Learning Eigenfunctions Links Spectral Embedding and Kernel PCA , 2004, Neural Computation.

[67]  Peter Meinicke Unsupervised learning in a generalized regression framework , 2000 .

[68]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[69]  I. Ohzawa,et al.  Neural mechanisms for processing binocular information I. Simple cells. , 1999, Journal of neurophysiology.

[70]  D. Ruderman,et al.  Independent component analysis of natural image sequences yields spatio-temporal filters similar to simple cells in primary visual cortex , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[71]  Roland Memisevic,et al.  Kernel information embeddings , 2006, ICML.

[72]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[73]  R. Memisevic An introduction to structured discriminative learning , 2006 .

[74]  W. Härdle Applied Nonparametric Regression , 1991 .

[75]  Michael H. Bowling,et al.  Action respecting embedding , 2005, ICML.

[76]  Kilian Q. Weinberger,et al.  Unsupervised Learning of Image Manifolds by Semidefinite Programming , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[77]  Deniz Erdogmus,et al.  Information Theoretic Learning , 2005, Encyclopedia of Artificial Intelligence.

[78]  I. Jolliffe Principal Component Analysis , 2002 .

[79]  Ahmed M. Elgammal,et al.  Facial Expression Analysis Using Nonlinear Decomposable Generative Models , 2005, AMFG.

[80]  Geoffrey E. Hinton,et al.  Multiple Relational Embedding , 2004, NIPS.

[81]  Yann LeCun,et al.  Loss Functions for Discriminative Training of Energy-Based Models , 2005, AISTATS.

[82]  Michael J. Black,et al.  On the Spatial Statistics of Optical Flow , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[83]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[84]  Demetri Terzopoulos,et al.  Multilinear Analysis of Image Ensembles: TensorFaces , 2002, ECCV.

[85]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[86]  Eugene L. Allgower,et al.  Continuation and path following , 1993, Acta Numerica.

[87]  Christopher M. Bishop,et al.  GTM: The Generative Topographic Mapping , 1998, Neural Computation.

[88]  Martial Hebert,et al.  Discriminative Fields for Modeling Spatial Dependencies in Natural Images , 2003, NIPS.

[89]  Liam Stewart,et al.  Structure Learning in Sequential Data , 2005 .

[90]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[91]  D. B. Graham,et al.  Characterising Virtual Eigensignatures for General Purpose Face Recognition , 1998 .

[92]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[93]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[94]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[95]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[96]  T. Sejnowski Higher‐order Boltzmann machines , 1987 .