Jointly social grouping and identification in visual dynamics with causality-induced hierarchical Bayesian model

Abstract We concentrate on modeling the person-person interactions for group activity recognition. In order to solve the complexity and ambiguity problems caused by a large number of human objects, we propose a causality-induced hierarchical Bayesian model to tackle the interaction activity video, referring to the “what” interaction activities happen, “where” interaction atomic occurs in spatial, and “when” group interaction happens in temporal. In particular, Granger Causality has been characterized with multiple features to encode the interacting relationships between each individual in the group. Furthermore, to detect and identify the concurrent interactive simultaneously, we investigate the Relative Entropy as a metric to measure the reasonable motion dependency between two arbitrary individuals. Filtered by the causality dependency, causality motion features have been cast as the multiplicative probabilistic ingredients in Bayesian factors to formulate the compact learned latent interaction patterns aggregately that enable the power of discrimination. Experiments demonstrate our model outperforms state-of-the-art models.

[1]  Michael S. Ryoo,et al.  Human activity prediction: Early recognition of ongoing activities from streaming videos , 2011, 2011 International Conference on Computer Vision.

[2]  Francesco Setti,et al.  F-Formation Detection: Individuating Free-Standing Conversational Groups in Images , 2015, PloS one.

[3]  Christophe Rosenberger,et al.  Abnormal events detection based on spatio-temporal co-occurences , 2009, CVPR.

[4]  Thomas L. Griffiths,et al.  Integrating Topics and Syntax , 2004, NIPS.

[5]  Jean-Marc Odobez,et al.  A Sequential Topic Model for Mining Recurrent Activities from Long Term Video Logs , 2013, International Journal of Computer Vision.

[6]  Wang Yan,et al.  Visual recognition by counting instances: A multi-instance cardinality potential kernel , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Leonid Sigal,et al.  Poselet Key-Framing: A Model for Human Activity Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Xiaogang Wang,et al.  Scene-Independent Group Profiling in Crowd , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Francesco Solera,et al.  Socially Constrained Structural Learning for Groups Detection in Crowd , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  M. Knapp,et al.  Nonverbal communication in human interaction , 1972 .

[11]  Silvio Savarese,et al.  Understanding Collective Activitiesof People from Videos , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Mustafa Ayazoglu,et al.  Finding Causal Interactions in Video Sequences , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  M. Argyle The Psychology of Interpersonal Behaviour , 1967 .

[14]  Fei-Fei Li,et al.  Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Silvio Savarese,et al.  Learning context for collective activity recognition , 2011, CVPR 2011.

[16]  Greg Mori,et al.  A Hierarchical Deep Temporal Model for Group Activity Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yang Wang,et al.  Discriminative Latent Models for Recognizing Contextual Group Activities , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Cordelia Schmid,et al.  Action recognition by dense trajectories , 2011, CVPR 2011.

[19]  Yun Fu,et al.  A Discriminative Model with Multiple Temporal Scales for Action Prediction , 2014, ECCV.

[20]  Meng Wang,et al.  A Framework of Joint Low-Rank and Sparse Regression for Image Memorability Prediction , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Qiang Ji,et al.  Video event recognition with deep hierarchical context model , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Silvio Savarese,et al.  Watch-n-patch: Unsupervised understanding of actions and relations , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Rama Chellappa,et al.  Recognizing Interactive Group Activities Using Temporal Interaction Matrices and Their Riemannian Statistics , 2012, International Journal of Computer Vision.

[24]  Simon J. Godsill,et al.  Detection and Tracking of Coordinated Groups , 2011, IEEE Transactions on Aerospace and Electronic Systems.

[25]  Yunde Jia,et al.  Interactive Phrases: Semantic Descriptionsfor Human Interaction Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Qi Tian,et al.  Enhancing Micro-video Understanding by Harnessing External Sounds , 2017, ACM Multimedia.

[27]  Unsang Park,et al.  Compositional interaction descriptor for human interaction recognition , 2017, Neurocomputing.

[28]  Meng Wang,et al.  Low-Rank Multi-View Embedding Learning for Micro-Video Popularity Prediction , 2018, IEEE Transactions on Knowledge and Data Engineering.

[29]  Shaogang Gong,et al.  Video Behaviour Mining Using a Dynamic Topic Model , 2011, International Journal of Computer Vision.

[30]  Tsuhan Chen,et al.  Spatio-Temporal Phrases for Activity Recognition , 2012, ECCV.

[31]  Bingbing Ni,et al.  Recognizing pair-activities by causality analysis , 2011, TIST.

[32]  Silvio Savarese,et al.  What are they doing? : Collective activity classification using spatio-temporal relationship among people , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[33]  Yi-Liang Zhao,et al.  Volunteerism Tendency Prediction via Harvesting Multiple Social Networks , 2016, ACM Trans. Inf. Syst..

[34]  Xuelong Li,et al.  Modeling Disease Progression via Multisource Multitask Learners: A Case Study With Alzheimer’s Disease , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Dietrich Fränken,et al.  Tracking of Extended Objects and Group Targets Using Random Matrices , 2008, IEEE Transactions on Signal Processing.

[36]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[37]  Nazli Ikizler-Cinbis,et al.  Two-person interaction recognition via spatial multiple instance embedding , 2015, J. Vis. Commun. Image Represent..

[38]  Cordelia Schmid,et al.  A Robust and Efficient Video Representation for Action Recognition , 2015, International Journal of Computer Vision.

[39]  Yi Yang,et al.  Data-Driven Answer Selection in Community QA Systems , 2017, IEEE Transactions on Knowledge and Data Engineering.

[40]  Michal Irani,et al.  Detecting Irregularities in Images and in Video , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[41]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[42]  Changyin Sun,et al.  Action Recognition Using Nonnegative Action Component Representation and Sparse Basis Selection , 2014, IEEE Transactions on Image Processing.

[43]  Alessio Del Bue,et al.  Social interaction discovery by statistical analysis of F-formations , 2011, BMVC.

[44]  Bo Gao,et al.  A discriminative key pose sequence model for recognizing human interactions , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[45]  Bingbing Ni,et al.  Recognizing human group activities with localized causalities , 2009, CVPR 2009.

[46]  J. Mccroskey,et al.  Nonverbal Behavior in Interpersonal Relations , 1987 .

[47]  Luming Zhang,et al.  Multiple Social Network Learning and Its Application in Volunteerism Tendency Prediction , 2015, SIGIR.

[48]  Tat-Seng Chua,et al.  Learning from Multiple Social Networks , 2016, Synthesis Lectures on Information Concepts, Retrieval, and Services.

[49]  Juan Carlos Niebles,et al.  Modeling Temporal Structure of Decomposable Motion Segments for Activity Classification , 2010, ECCV.

[50]  C. Granger Investigating Causal Relations by Econometric Models and Cross-Spectral Methods , 1969 .

[51]  Meng Wang,et al.  Disease Inference from Health-Related Questions via Sparse Deep Learning , 2015, IEEE Transactions on Knowledge and Data Engineering.

[52]  Silvio Savarese,et al.  Recognizing human actions by attributes , 2011, CVPR 2011.

[53]  Jaroslaw Was,et al.  Crowd Dynamics Modeling in the Light of Proxemic Theories , 2010, ICAISC.

[54]  Cordelia Schmid,et al.  Human Detection Using Oriented Histograms of Flow and Appearance , 2006, ECCV.

[55]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[56]  Qiang Ji,et al.  Modeling Temporal Interactions with Interval Temporal Bayesian Networks for Complex Activity Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.