Unsupervised Discovery of Action Classes

In this paper we consider the problem of describing the action being performed by human figures in still images. We will attack this problem using an unsupervised learning approach, attempting to discover the set of action classes present in a large collection of training images. These action classes will then be used to label test images. Our approach uses the coarse shape of the human figures to match pairs of images. The distance between a pair of images is computed using a linear programming relaxation technique. This is a computationally expensive process, and we employ a fast pruning method to enable its use on a large collection of images. Spectral clustering is then performed using the resulting distances. We present clustering and image labeling results on a variety of datasets.

[1]  Mubarak Shah,et al.  Motion-Based Recognition , 1997, Computational Imaging and Vision.

[2]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  J. Little,et al.  Recognizing People by Their Gait: The Shape of Motion , 1998 .

[4]  Dariu Gavrila,et al.  Real-time object detection for "smart" vehicles , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5]  Irfan A. Essa,et al.  Exploiting human actions and object context for recognition tasks , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  Xavier Binefa,et al.  Robust Real-Time Periodic Motion Detection, Analysis, and Applications , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Jesse Hoey,et al.  Hierarchical unsupervised learning of facial expression categories , 2001, Proceedings IEEE Workshop on Detection and Recognition of Events in Video.

[8]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Stefan Carlsson,et al.  Recognizing and Tracking Human Action , 2002, ECCV.

[10]  Jianbo Shi,et al.  Detecting unusual activity in video , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[11]  Mubarak Shah,et al.  View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[12]  Randal C. Nelson,et al.  Detection and Recognition of Periodic, Nonrigid Motion , 1997, International Journal of Computer Vision.

[13]  Tamara L. Berg,et al.  Names and faces in the news , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[14]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Jitendra Malik,et al.  Efficient shape matching using shape contexts , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[18]  Ze-Nian Li,et al.  Linear Programming Matching and Appearance-Adaptive Object Tracking , 2005, EMMCVPR.

[19]  Shaogang Gong,et al.  Video behaviour profiling and abnormality detection without manual labelling , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.