论文信息 - Structured Models for Action Recognition in Real-word Videos

Structured Models for Action Recognition in Real-word Videos

This dissertation introduces novel models to recognize broad action categories --- like "opening a door" and "running" --- in real-world video data such as movies and internet videos. In particular, we investigate how an action can be decomposed, what is its discriminative structure, and how to use this information to accurately represent video content. The main challenge we address lies in how to build models of actions that are simultaneously information-rich --- in order to correctly differentiate between different action categories --- and robust to the large variations in actors, actions, and videos present in real-world data. We design three robust models capturing both the content of and the relations between action parts. Our approach consists in structuring collections of robust local features --- such as spatio-temporal interest points and short-term point trajectories. We also propose efficient kernels to compare our structured action representations. Even if they share the same principles, our methods differ in terms of the type of problem they address and the structure information they rely on. We, first, propose to model a simple action as a sequence of meaningful atomic temporal parts. We show how to learn a flexible model of the temporal structure and how to use it for the problem of action localization in long unsegmented videos. Extending our ideas to the spatio-temporal structure of more complex activities, we, then, describe a large-scale unsupervised learning algorithm used to hierarchically decompose the motion content of videos. We leverage the resulting tree-structured decompositions to build hierarchical action models and provide an action kernel between unordered binary trees of arbitrary sizes. Instead of structuring action models, we, finally, explore another route: directly comparing models of the structure. We view short-duration actions as high-dimensional time-series and investigate how an action's temporal dynamics can complement the state-of-the-art unstructured models for action classification. We propose an efficient kernel to compare the temporal dependencies between two actions and show that it provides useful complementary information to the traditional bag-of-features approach. In all three cases, we conducted thorough experiments on some of the most challenging benchmarks used by the action recognition community. We show that each of our methods significantly outperforms the related state of the art, thus highlighting the importance of structure information for accurate and robust action recognition in real-world videos.

Adrien Gaidon | Adrien Gaidon

[1] Jitendra Malik,et al. Recognizing action at a distance , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[3] Zaïd Harchaoui,et al. Kernel Change-point Analysis , 2008, NIPS.

[4] David J. Kriegman,et al. Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Ivan Laptev,et al. Track to the future: Spatio-temporal video segmentation with long-range motion cues , 2011, CVPR 2011.

[6] Carlo Tomasi,et al. Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[7] Yaser Sheikh,et al. Exploring the space of a human action , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8] Juan Carlos Niebles,et al. Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[9] William Brendel,et al. Learning spatiotemporal graphs of human activities , 2011, 2011 International Conference on Computer Vision.

[10] Yi Lin,et al. Statistical Properties and Adaptive Tuning of Support Vector Machines , 2002, Machine Learning.

[11] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[12] Tomoko Matsui,et al. A Kernel for Time Series Based on Global Alignments , 2006, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[13] Ramakant Nevatia,et al. Large-scale event detection using semi-hidden Markov models , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14] Rama Chellappa,et al. View Invariance for Human Action Recognition , 2005, International Journal of Computer Vision.

[15] Gunnar Farnebäck,et al. Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[16] Fei-Fei Li,et al. Learning latent temporal structure for complex event detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Hassan Foroosh,et al. Action recognition using rank-1 approximation of Joint Self-Similarity Volume , 2011, 2011 International Conference on Computer Vision.

[18] Andrew Zisserman,et al. Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[19] D. Sculley,et al. Web-scale k-means clustering , 2010, WWW '10.

[20] Ashok Srivastava,et al. Stable and Efficient Gaussian Process Calculations , 2009, J. Mach. Learn. Res..

[21] Ronen Basri,et al. Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[22] S. Chiba,et al. Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[23] Bernhard Schölkopf,et al. Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[24] Aaron F. Bobick,et al. Recognition of human body motion using phase space constraints , 1995, Proceedings of IEEE International Conference on Computer Vision.

[25] Juan Carlos Niebles,et al. A Hierarchical Model of Shape and Appearance for Human Action Classification , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Mubarak Shah,et al. View-Invariant Representation and Recognition of Actions , 2002, International Journal of Computer Vision.

[27] 美紀長谷山,et al. Hidden Conditional Random Fieldsによる映像の構造解析に基づくシーン分割の高精度化に関する検討 (メディア工学映像表現&コンピュータグラフィックスヒューマンインフォメーション) , 2014 .

[28] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[29] Patrick Pérez,et al. View-Independent Action Recognition from Temporal Self-Similarities , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30] Jitendra Malik,et al. Motion segmentation and tracking using normalized cuts , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[31] Nazli Ikizler-Cinbis,et al. Object, Scene and Actions: Combining Multiple Features for Human Action Recognition , 2010, ECCV.

[32] Pietro Perona,et al. Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[33] Gabriel Valiente,et al. Algorithms on Trees and Graphs , 2002, Springer Berlin Heidelberg.

[34] Jitendra Malik,et al. Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36] Jitendra Malik,et al. Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37] Ivan Laptev,et al. On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[38] Martial Hebert,et al. Volumetric Features for Video Event Detection , 2010, International Journal of Computer Vision.

[39] Martial Hebert,et al. Modeling the Temporal Extent of Actions , 2010, ECCV.

[40] Alex Pentland,et al. Space-time gestures , 1993, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[41] Cordelia Schmid,et al. Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[42] Ramakant Nevatia,et al. Single View Human Action Recognition using Key Pose Matching and Viterbi Path Searching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[43] Cordelia Schmid,et al. Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[44] Luc Van Gool,et al. Exemplar-based Action Recognition in Video , 2009, BMVC.

[45] Jitendra Malik,et al. Object Segmentation by Long Term Analysis of Point Trajectories , 2010, ECCV.

[46] Cordelia Schmid,et al. Action recognition by dense trajectories , 2011, CVPR 2011.

[47] J. Cohn,et al. Use of Automated Facial Image Analysis for Measurement of Emotion Expression , 2004 .

[48] Tae-Kyun Kim,et al. Learning Motion Categories using both Semantic and Structural Information , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[49] Christopher Joseph Pal,et al. Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[50] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51] Francis R. Bach,et al. A convex relaxation for weakly supervised classifiers , 2012, ICML.

[52] D. W. Scott,et al. Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[53] Cordelia Schmid,et al. Explicit Modeling of Human-Object Interactions in Realistic Videos , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54] Kristin J. Dana,et al. Compact representation of bidirectional texture functions , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[55] Harpreet S. Sawhney,et al. Action video retrieval based on atomic action vocabulary , 2008, MIR '08.

[56] Adrian Hilton,et al. A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[57] Eli Shechtman,et al. Space-time behavior based correlation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[58] Andrew Gilbert,et al. Action Recognition Using Mined Hierarchical Compound Features , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59] Ronald W. Schafer,et al. Introduction to Digital Speech Processing , 2007, Found. Trends Signal Process..

[60] Alex Pentland,et al. A Bayesian Computer Vision System for Modeling Human Interactions , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[61] Cordelia Schmid,et al. Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[62] C. Morandi,et al. Registration of Translated and Rotated Images Using Finite Fourier Transforms , 1987, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63] Li Wang,et al. Human Action Segmentation and Recognition Using Discriminative Semi-Markov Models , 2011, International Journal of Computer Vision.

[64] Yang Wang,et al. Hidden Part Models for Human Action Recognition: Probabilistic versus Max Margin , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65] Jean-Philippe Vert,et al. A bagging SVM to learn from positive and unlabeled examples , 2010, Pattern Recognit. Lett..

[66] Juan Carlos Niebles,et al. Spatial-Temporal correlatons for unsupervised action classification , 2008, 2008 IEEE Workshop on Motion and video Computing.

[67] Jason J. Corso,et al. Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[68] Bernhard Schölkopf,et al. Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[69] Christopher G. Harris,et al. A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[70] Hsuan-Tien Lin,et al. A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[71] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[72] Luc Van Gool,et al. An Efficient Dense and Scale-Invariant Spatio-Temporal Interest Point Detector , 2008, ECCV.

[73] Radu Horaud,et al. An Unsupervised Framework for Action Recognition Using Actemes , 2010, ACCV.

[74] Tieniu Tan,et al. A survey on visual surveillance of object motion and behaviors , 2004, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[75] Serge J. Belongie,et al. Behavior recognition via sparse spatio-temporal features , 2005, 2005 IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance.

[76] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[77] William Brendel,et al. Activities as Time Series of Human Postures , 2010, ECCV.

[78] Tae-Kyun Kim,et al. Canonical Correlation Analysis of Video Volume Tensors for Action Categorization and Detection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[79] Ying Wu,et al. Discriminative Video Pattern Search for Efficient Action Detection , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[80] James L. Crowley,et al. Probabilistic recognition of activity using local appearance , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[81] Fernando De la Torre,et al. Joint segmentation and classification of human actions in video , 2011, CVPR 2011.

[82] Nazli Ikizler-Cinbis,et al. Learning actions from the Web , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[83] Fernando De la Torre,et al. Action unit detection with segment-based SVMs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[84] Deniz Erdogmus,et al. A reproducing kernel Hilbert space framework for pairwise time series distances , 2008, ICML '08.

[85] Seth J. Teller,et al. Particle Video: Long-Range Motion Estimation Using Point Trajectories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[86] Cordelia Schmid,et al. Actom sequence models for efficient action detection , 2011, CVPR 2011.

[87] Michael I. Jordan,et al. Multiple kernel learning, conic duality, and the SMO algorithm , 2004, ICML.

[88] Ashok Veeraraghavan,et al. The Function Space of an Activity , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[89] Matthias Hein,et al. Hilbertian Metrics and Positive Definite Kernels on Probability Measures , 2005, AISTATS.

[90] David Haussler,et al. Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[91] Eric R. Ziegel,et al. The Elements of Statistical Learning , 2003, Technometrics.

[92] Paul Beaudet,et al. Rotationally invariant image operators , 1978 .

[93] Patrick Pérez,et al. Clustering Point Trajectories with Various Life-Spans , 2009, 2009 Conference for Visual Media Production.

[94] Paul Over,et al. Evaluation campaigns and TRECVid , 2006, MIR '06.

[95] Mubarak Shah,et al. Incremental action recognition using feature-tree , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[96] Hans-Georg Müller,et al. Functional Data Analysis , 2016 .

[97] E. Nyström. Über Die Praktische Auflösung von Integralgleichungen mit Anwendungen auf Randwertaufgaben , 1930 .

[98] Zaïd Harchaoui,et al. Testing for Homogeneity with Kernel Fisher Discriminant Analysis , 2007, NIPS.

[99] Juan Carlos Niebles,et al. Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[100] Matthijs C. Dorst. Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[101] James W. Davis,et al. The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[102] Yoav Freund,et al. A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[103] Alex Pentland,et al. Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[104] Aaron F. Bobick,et al. Parametric Hidden Markov Models for Gesture Recognition , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[105] Mubarak Shah,et al. Time series prediction by chaotic modeling of nonlinear dynamical systems , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[106] Larry S. Davis,et al. Towards 3-D model-based tracking and recognition of human movement: a multi-view approach , 1995 .

[107] R. Nelson,et al. Low level recognition of human motion (or how to get your man without finding his body parts) , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[108] Mubarak Shah,et al. Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[109] Bernt Schiele,et al. Pedestrian detection in crowded scenes , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[110] Jake K. Aggarwal,et al. Modeling human activities as speech , 2011, CVPR 2011.

[111] Franziska Meier,et al. 3D Shape Context and Distance Transform for action recognition , 2008, 2008 19th International Conference on Pattern Recognition.

[112] Thomas Serre,et al. HMDB: A large video database for human motion recognition , 2011, 2011 International Conference on Computer Vision.

[113] Larry S. Davis,et al. Recognizing Human Actions by Learning and Matching Shape-Motion Prototype Trees , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[114] S T Roweis,et al. Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[115] Nello Cristianini,et al. Kernel Methods for Pattern Analysis , 2003, ICTAI.

[116] Jean Ponce,et al. Automatic annotation of human actions in video , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[117] Alexander J. Smola,et al. Learning with kernels , 1998 .

[118] Iasonas Kokkinos,et al. Discovering discriminative action parts from mid-level video representations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[119] Martial Hebert,et al. Trajectons: Action recognition through the motion analysis of tracked features , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[120] Maja Pantic,et al. An implicit spatiotemporal shape model for human activity localization and recognition , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[121] Matthias W. Seeger,et al. Using the Nyström Method to Speed Up Kernel Machines , 2000, NIPS.

[122] Jiebo Luo,et al. Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[123] WangYang,et al. Hidden Part Models for Human Action Recognition , 2011 .

[124] Cordelia Schmid,et al. Will person detection help bag-of-features action recognition? , 2010 .

[125] Rémi Ronfard,et al. Free viewpoint action recognition using motion history volumes , 2006, Comput. Vis. Image Underst..

[126] Aaron F. Bobick,et al. Learning visual behavior for gesture analysis , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[127] Peter E. Hart,et al. Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[128] Patrick Pérez,et al. Retrieving actions in movies , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[129] Quoc V. Le,et al. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis , 2011, CVPR 2011.

[130] Junji Yamato,et al. Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[131] Cordelia Schmid,et al. Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[132] Alex Pentland,et al. Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[133] Cordelia Schmid,et al. A Spatio-Temporal Descriptor Based on 3D-Gradients , 2008, BMVC.

[134] Pascal Fua,et al. Making Action Recognition Robust to Occlusions and Viewpoint Changes , 2010, ECCV.

[135] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[136] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[137] Mubarak Shah,et al. Chaotic Invariants for Human Action Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[138] Larry Wasserman,et al. All of Statistics: A Concise Course in Statistical Inference , 2004 .

[139] Cordelia Schmid,et al. Human Focused Action Localization in Video , 2010, ECCV Workshops.

[140] Martial Hebert,et al. Representing Pairwise Spatial and Temporal Relations for Action Recognition , 2010, ECCV.

[141] Krystian Mikolajczyk,et al. Action recognition with motion-appearance vocabulary forest , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[142] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[143] Jean Ponce,et al. Multi-class cosegmentation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[144] Thomas Hofmann,et al. Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[145] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[146] Subhransu Maji,et al. Action recognition from a distributed representation of pose and appearance , 2011, CVPR 2011.

[147] Matthew Brand,et al. Discovery and Segmentation of Activities in Video , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[148] M. Rosenblatt. Remarks on Some Nonparametric Estimates of a Density Function , 1956 .

[149] Lihi Zelnik-Manor,et al. Event-based analysis of video , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[150] Edward H. Adelson,et al. Analyzing and recognizing walking figures in XYT , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[151] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[152] Adriana Kovashka,et al. Learning a hierarchy of discriminative space-time neighborhood features for human action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[153] Cordelia Schmid,et al. Mining Visual Actions from Movies , 2009, BMVC.

[154] Qiang Ji,et al. Knowledge Based Activity Recognition with Dynamic Bayesian Network , 2010, ECCV.

[155] Sebastian Nowozin,et al. Discriminative Subsequence Mining for Action Classification , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[156] Ian D. Reid,et al. High Five: Recognising human interactions in TV shows , 2010, BMVC.

[157] Alex Zelinsky,et al. Learning OpenCV---Computer Vision with the OpenCV Library (Bradski, G.R. et al.; 2008)[On the Shelf] , 2009, IEEE Robotics & Automation Magazine.

[158] Silvio Savarese,et al. Recognizing human actions by attributes , 2011, CVPR 2011.

[159] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[160] Cordelia Schmid,et al. A time series kernel for action recognition , 2011, BMVC.

[161] Luc Van Gool,et al. A Hough transform-based voting framework for action recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[162] Luc Van Gool,et al. Action snippets: How many frames does human action recognition require? , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.