论文信息 - Fast, Large-Scale Transformation-Invariant Clustering

Fast, Large-Scale Transformation-Invariant Clustering

In previous work on "transformed mixtures of Gaussians" and "transformed hidden Markov models", we showed how the EM algorithm in a discrete latent variable model can be used to jointly normalize data (e.g., center images, pitch-normalize spectrograms) and learn a mixture model of the normalized data. The only input to the algorithm is the data, a list of possible transformations, and the number of clusters to find. The main criticism of this work was that the exhaustive computation of the posterior probabilities over transformations would make scaling up to large feature vectors and large sets of transformations intractable. Here, we describe how a tremendous speed-up is acheived through the use of a variational technique for decoupling transformations, and a fast Fourier transform method for computing posterior probabilities. For N × N images, learning C clusters under N rotations, N scales, N x-translations and N y-translations takes only (C + 2 log N)N2 scalar operations per iteration. In contrast, the original algorithm takes CN6 operations to account for these transformations. We give results on learning a 4-component mixture model from a video sequence with frames of size 320×240. The model accounts for 360 rotations and 76,800 translations. Each iteration of EM takes only 10 seconds per frame in MATLAB, which is over 5 million times faster than the original algorithm.

Brendan J. Frey | Nebojsa Jojic | B. Frey | N. Jojic

[1] Geoffrey E. Hinton,et al. A View of the Em Algorithm that Justifies Incremental, Sparse, and other Variants , 1998, Learning in Graphical Models.

[2] Geoffrey E. Hinton,et al. Modeling the manifolds of images of handwritten digits , 1997, IEEE Trans. Neural Networks.

[3] Brendan J. Frey,et al. Transformed hidden Markov models: estimating mixture models of images and inferring spatial transformations in video sequences , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[4] Thomas G. Dietterich,et al. Editors. Advances in Neural Information Processing Systems , 2002 .

[5] George Wolberg,et al. Robust image registration using log-polar transform , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[6] Nuno Vasconcelos,et al. Multiresolution Tangent Distance for Affine-invariant Classification , 1997, NIPS.

[7] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9] Brendan J. Frey,et al. Transformation-Invariant Clustering and Dimensionality Reduction Using EM , 2001 .

[10] Yann LeCun,et al. Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.