Transformation-Invariant Clustering and Dimensionality Reduction Using EM

Clustering and dimensionality reduction are simple, effective ways to derive useful representations of data, such as images. These procedures often are used as preprocessing steps for more sophisticated pattern analysis techniques. (In fact, these procedures often perform as well as or better than more sophisticated pattern analysis techniques.) However, in situations where each input has been randomly transformed (e.g., by translation, rotation and shearing in images), these methods tend to extract cluster centers and submanifolds that account for variations in the input due to transformations, instead of more interesting and potentially useful structure. For example, if images of a human face are clustered, it would be more useful for the different clusters to represent different poses and expressions, instead of different translations and rotations. We describe a way to add transformation invariance to mixture models, factor analyzers and mixtures of factor analyzers by approximating the nonlinear transformation manifold by a discrete set of points. In contrast to linear approximations of the transformation manifold, which assume the amount of transformation is small, our method works well for large levels of transformation. We show how the expectation maximization algorithm can be used to jointly learn a set of clusters, a subspace model, or a mixture of subspace models and at the same time infer the transformation associated with each case. After illustrating this technique on some difficult contrived problems, we compare the technique with other methods for filtering noisy images obtained from a scanning electron microscope, clustering images of faces into different categories of identification and pose, subspace modeling of facial expressions, subspace modeling of images of handwritten digits for handwriting classification, and unsupervised classification of images of handwritten digits.

[1]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[2]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[3]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[4]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[5]  Yann LeCun,et al.  Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[6]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[7]  Michael J. Black,et al.  Mixture models for optical flow computation , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[9]  Edward H. Adelson,et al.  Representing moving images with layers , 1994, IEEE Trans. Image Process..

[10]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[12]  Nuno Vasconcelos,et al.  Multiresolution Tangent Distance for Affine-invariant Classification , 1997, NIPS.

[13]  Geoffrey E. Hinton,et al.  Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[14]  Alex Pentland,et al.  Probabilistic Visual Learning for Object Representation , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Yair Weiss,et al.  Smoothness in layers: Motion segmentation using nonparametric mixture estimation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Tomaso A. Poggio,et al.  Example-Based Learning for View-Based Human Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18]  Takeo Kanade,et al.  Rotation invariant neural network-based face detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Takeo Kanade,et al.  Rotation Invariant Neural Network-Based Face Detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[21]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[22]  Brendan J. Frey,et al.  Estimating mixture models of images and inferring spatial transformations using the EM algorithm , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[23]  Brendan J. Frey,et al.  Transformed component analysis: joint estimation of spatial transformations and image components , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[24]  Brendan J. Frey,et al.  Topographic Transformation as a Discrete Latent Variable , 1999, NIPS.

[25]  B. Frey,et al.  Transformation-invariant filtering using expectation maximization , 2000, Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373).

[26]  Brendan J. Frey,et al.  Transformed hidden Markov models: estimating mixture models of images and inferring spatial transformations in video sequences , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).