Coaction discovery: segmentation of common actions across multiple videos

We introduce a new problem called coaction discovery: the task of discovering and segmenting the common actions (coactions) between videos that may contain several actions. This paper presents an approach for coaction discovery; the key idea of our approach is to compute an action proposal map for each video based jointly on dynamic object-motion and static appearance semantics, and unsupervisedly cluster each video into atomic action clips, called actoms. Subsequently, we use a temporally coherent discriminative clustering framework for extracting the coactions. We apply our coaction discovery approach to two datasets and demonstrate convincing and superior performance to three baseline methods.

[1]  Fernando De la Torre,et al.  Unsupervised discovery of facial events , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Andrew Blake,et al.  Cosegmentation of Image Pairs by Histogram Matching - Incorporating a Global Constraint into MRFs , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Luc Van Gool,et al.  A Hough transform-based voting framework for action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  C. Bregler,et al.  Large displacement optical flow , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[6]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[7]  Vikas Singh,et al.  An efficient algorithm for Co-segmentation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Zaïd Harchaoui,et al.  DIFFRAC: a discriminative and flexible framework for clustering , 2007, NIPS.

[9]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Jean Ponce,et al.  Discriminative clustering for image co-segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Yihong Gong,et al.  Action detection in complex scenes with spatial and temporal ambiguities , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Cordelia Schmid,et al.  A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[15]  Takeo Kanade,et al.  Background Subtraction for Freely Moving Cameras , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[16]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[19]  Ying Wu,et al.  Discriminative subvolume search for efficient action detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Cordelia Schmid,et al.  Actom sequence models for efficient action detection , 2011, CVPR 2011.