Investigating time-sensitive topic model approaches for action recognition

AbstractIn this paper, we present several attempts of using topic models for ac-tion recognition in videos. We show that time-sensitive topic models helprecognizing actions when little training data is available. We also exhibitsome limitations of these models when dealing with complex videos. Newapplications of these models in semi-supervised settings and the use of in-herently discrimant models such as the MedLDA one are also considered. 1 Introduction Action recognition is an important eld of video processing. Its applications cov-ers, among others, automatic annotation of videos, improved human-computerinteraction and guidance in monitoring public spaces. Most state-of-the-arttechniques for action recognition video documents rely on Bag-of-Word (BoW)representations. The latter are built from quantized spatio-temporal descriptorscollected over long video segments [10, 14, 12, 13]. Such methods, however, donot encode the time information, although actions are characterized by strongtemporal components. To address this issue and enhance action recognition per-formance, we investigate the use novel principled probabilistic methods (calledtopic models) for capturing the temporal relationships between characteristicsub-units of a given action. In a previous paper [17], we showed these modelscould help action recognition when little training information is available. Inthe following, we expose more experiments to better spot strong points andweaknesses of these models when used for action recognition.

[1]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[2]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models , 2012, J. Mach. Learn. Res..

[3]  A. McCallum,et al.  Topical N-Grams: Phrase and Topic Discovery, with an Application to Information Retrieval , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[4]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[5]  W. Eric L. Grimson,et al.  Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Michael I. Jordan,et al.  DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification , 2008, NIPS.

[7]  Daniel Gatica-Perez,et al.  What did you do today?: discovering daily routines from large-scale mobile data , 2008, ACM Multimedia.

[8]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[9]  Bernt Schiele,et al.  Discovery of activity patterns using topic models , 2008 .

[10]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Ronald Poppe,et al.  A survey on vision-based human action recognition , 2010, Image Vis. Comput..

[12]  Jean-Marc Odobez,et al.  Time-sensitive topic models for action recognition in videos , 2013, 2013 IEEE International Conference on Image Processing.

[13]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2008, International Journal of Computer Vision.

[14]  Jean-Marc Odobez,et al.  Probabilistic Latent Sequential Motifs: Discovering Temporal Activity Patterns in Video Scenes , 2010, BMVC.

[15]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[16]  Jean-Marc Odobez,et al.  Extracting and locating temporal motifs in video scenes using a hierarchical non parametric Bayesian model , 2011, CVPR 2011.

[17]  Juan Carlos Niebles,et al.  Unsupervised Learning of Human Action Categories Using Spatial-Temporal Words , 2006, BMVC.

[18]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[19]  Cristian Sminchisescu,et al.  Conditional models for contextual human motion recognition , 2006, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[20]  Ivan Laptev,et al.  On Space-Time Interest Points , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[21]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[22]  Cordelia Schmid,et al.  Evaluation of Local Spatio-temporal Features for Action Recognition , 2009, BMVC.

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24]  Fei-Fei Li,et al.  Action Recognition with Exemplar Based 2.5D Graph Matching , 2012, ECCV.

[25]  Cordelia Schmid,et al.  Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.