论文信息 - Combining Per-frame and Per-track Cues for Multi-person Action Recognition

Combining Per-frame and Per-track Cues for Multi-person Action Recognition

We propose a model to combine per-frame and per-track cues for action recognition. With multiple targets in a scene, our model simultaneously captures the natural harmony of an individual's action in a scene and the flow of actions of an individual in a video sequence, inferring valid tracks in the process. Our motivation is based on the unlikely discordance of an action in a structured scene, both at the track level and the frame level (e.g., a person dancing in a crowd of joggers). While we can utilize sampling approaches for inference in our model, we instead devise a global inference algorithm by decomposing the problem and solving the subproblems exactly and efficiently, recovering a globally optimal joint solution in several cases. Finally, we improve on the state-of-the-art action recognition results for two publicly available datasets.

Larry S. Davis | Vlad I. Morariu | Sameh Khamis | L. Davis | S. Khamis

[1] Andrew McCallum,et al. Piecewise Training for Undirected Models , 2005, UAI.

[2] Ronen Basri,et al. Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[3] Mohamed R. Amer,et al. Multiobject tracking as maximum weight independent set , 2011, CVPR 2011.

[4] Barbara Caputo,et al. Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[5] Devavrat Shah,et al. Belief Propagation for Min-Cost Network Flow: Convergence and Correctness , 2010, Oper. Res..

[6] Larry S. Davis,et al. Objects in Action: An Approach for Combining Action Understanding and Object Perception , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Larry S. Davis,et al. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Alan Fern,et al. Probabilistic event logic for interval-based event recognition , 2011, CVPR 2011.

[9] Silvio Savarese,et al. What are they doing? : Collective activity classification using spatio-temporal relationship among people , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[10] Yang Wang,et al. Retrieving Actions in Group Contexts , 2010, ECCV Workshops.

[11] Jake K. Aggarwal,et al. Stochastic Representation and Recognition of High-Level Group Activities , 2011, International Journal of Computer Vision.

[12] Nikos Komodakis,et al. MRF Optimization via Dual Decomposition: Message-Passing Revisited , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[13] Judea Pearl,et al. Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[14] Antonio Criminisi,et al. TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[15] Yang Wang,et al. Beyond Actions: Discriminative Models for Contextual Group Activities , 2010, NIPS.

[16] Pascal Fua,et al. Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Multiple Object Tracking Using K-shortest Paths Optimization , 2022 .

[17] Larry S. Davis,et al. Multi-agent event recognition in structured scenarios , 2011, CVPR 2011.

[18] Pascal Fua,et al. Tracking multiple people under global appearance constraints , 2011, 2011 International Conference on Computer Vision.

[19] Kilian Q. Weinberger,et al. Fast solvers and efficient implementations for distance metric learning , 2008, ICML '08.

[20] Mubarak Shah,et al. Learning, detection and representation of multi-agent events in videos , 2007, Artif. Intell..

[21] Ramakant Nevatia,et al. Global data association for multi-object tracking using network flows , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[22] Charless C. Fowlkes,et al. Globally-optimal greedy algorithms for tracking a variable number of objects , 2011, CVPR 2011.

[23] David A. McAllester,et al. A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Larry S. Davis,et al. A flow model for joint action recognition and identity maintenance , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Joost van de Weijer,et al. Harmony potentials for joint classification and segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26] Shaogang Gong,et al. Beyond Tracking: Modelling Activity and Understanding Behaviour , 2006, International Journal of Computer Vision.

[27] Chih-Jen Lin,et al. LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[28] Silvio Savarese,et al. Learning context for collective activity recognition , 2011, CVPR 2011.

[29] Axel Pinz,et al. Computer Vision – ECCV 2006 , 2006, Lecture Notes in Computer Science.

[30] K. Schittkowski,et al. NONLINEAR PROGRAMMING , 2022 .

[31] Fei-Fei Li,et al. Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[33] Cordelia Schmid,et al. Actions in context , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.