Dual soft assignment clustering algorithm for human action video clustering

Abstract Dual assignment clustering (DAC) has been recently proposed in computer vision, shown to yield improved accuracy for action clustering tasks. The key idea of DAC is to consider another view (different from the original features) for the same set of samples, and to exploit the statistical correlation between cluster assignments in two views. However, the existing optimization is heuristic, mainly due to the difficulty in combinatorial optimization for hard cluster assignment. In this paper, we introduce a novel DAC optimization algorithm based on a probabilistic (soft) treatment, where the proposed objective function incorporates both the goodness of clustering in each view and the correlation between two views in a more principled and theoretically sound fashion. We also propose a lower-bound maximization technique that not only admits fast per-iteration solutions but also guarantees convergence to a local optimum. The superiority of the proposed approach to the existing methods is demonstrated for several activity video datasets.

[1]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[2]  Chris H. Q. Ding,et al.  A Probabilistic Approach for Optimizing Spectral Clustering , 2005, NIPS.

[3]  Steffen Bickel,et al.  Multi-view clustering , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[4]  Mubarak Shah,et al.  Discovering Motion Primitives for Unsupervised Grouping and One-Shot Learning of Human Actions, Gestures, and Expressions , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Mubarak Shah,et al.  Recognizing 50 human action categories of web videos , 2012, Machine Vision and Applications.

[6]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Cordelia Schmid,et al.  Actions in context , 2009, CVPR.

[8]  Nazli Ikizler-Cinbis,et al.  Object, Scene and Actions: Combining Multiple Features for Human Action Recognition , 2010, ECCV.

[9]  Cordelia Schmid,et al.  Weakly Supervised Learning of Interactions between Humans and Objects , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[11]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[12]  Jiebo Luo,et al.  Recognizing realistic actions from videos , 2009, CVPR.

[13]  Ling Shao,et al.  Unsupervised Spectral Dual Assignment Clustering of Human Actions in Context , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Hal Daumé,et al.  A Co-training Approach for Multi-view Spectral Clustering , 2011, ICML.

[15]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.