On Affine Invariant Clustering and Automatic Cast Listing in Movies

We develop a distance metric for clustering and classification algorithms which is invariant to affine transformations and includes priors on the transformation parameters. Such clustering requirements are generic to a number of problems in computer vision.We extend existing techniques for affine-invariant clustering, and show that the new distance metric outperforms existing approximations to affine invariant distance computation, particularly under large transformations. In addition, we incorporate prior probabilities on the transformation parameters. This further regularizes the solution, mitigating a rare but serious tendency of the existing solutions to diverge. For the particular special case of corresponding point sets we demonstrate that the affine invariant measure we introduced may be obtained in closed form.As an application of these ideas we demonstrate that the faces of the principal cast of a feature film can be generated automatically using clustering with appropriate invariance. This is a very demanding test as it involves detecting and clustering over tens of thousands of images with the variances including changes in viewpoint, lighting, scale and expression.

[1]  Richard H. Byrd,et al.  A Trust Region Algorithm for Nonlinearly Constrained Optimization , 1987 .

[2]  Pietro Perona,et al.  A Probabilistic Approach to Object Recognition Using Local Photometry and Global Geometry , 1998, ECCV.

[3]  Michal Irani,et al.  Multi-frame optical flow estimation using subspace constraints , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[4]  Andrew Zisserman,et al.  Feature Based Methods for Structure and Motion Estimation , 1999, Workshop on Vision Algorithms.

[5]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[6]  Daniel P. Fasulo,et al.  An Analysis of Recent Work on Clustering Algorithms , 1999 .

[7]  Nuno Vasconcelos,et al.  Multiresolution Tangent Distance for Affine-invariant Classification , 1997, NIPS.

[8]  Cordelia Schmid,et al.  Constructing models for content-based image retrieval , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[9]  Cordelia Schmid,et al.  Face detection in a video sequence - a temporal approach , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[10]  Michael J. Black,et al.  Robust Principal Component Analysis for Computer Vision , 2001, ICCV.

[11]  Michael J. Black,et al.  Learning image statistics for Bayesian tracking , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  Yali Amit,et al.  A Computational Model for Visual Selection , 1999, Neural Computation.

[14]  Takeo Kanade,et al.  Shape and motion from image streams under orthography: a factorization method , 1992, International Journal of Computer Vision.

[15]  Yann LeCun,et al.  Transformation Invariance in Pattern Recognition-Tangent Distance and Tangent Propagation , 1996, Neural Networks: Tricks of the Trade.

[16]  Erik Hjelmås,et al.  Face Detection: A Survey , 2001, Comput. Vis. Image Underst..

[17]  William H. Press,et al.  Numerical recipes in C , 2002 .

[18]  Henry Schneiderman,et al.  A histogram-based method for detection of faces and cars , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[19]  P. Anandan,et al.  About Direct Methods , 1999, Workshop on Vision Algorithms.

[20]  Bernard Victorri,et al.  Transformation invariance in pattern recognition: Tangent distance and propagation , 2000 .

[21]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[22]  Richard Szeliski,et al.  Vision Algorithms: Theory and Practice , 2002, Lecture Notes in Computer Science.

[23]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[24]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[25]  K. Mardia,et al.  Statistical Shape Analysis , 1998 .

[26]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[27]  Yann LeCun,et al.  Efficient Pattern Recognition Using a New Transformation Distance , 1992, NIPS.

[28]  Andrew Blake,et al.  Probabilistic tracking in a metric space , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[29]  Jitendra Malik,et al.  Recognizing surfaces using three-dimensional textons , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[30]  Nicholas I. M. Gould,et al.  Trust Region Methods , 2000, MOS-SIAM Series on Optimization.

[31]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Bernhard Schölkopf,et al.  Incorporating Invariances in Support Vector Learning Machines , 1996, ICANN.

[33]  Brendan J. Frey,et al.  Transformed component analysis: joint estimation of spatial transformations and image components , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.