Unsupervised Framework for Interactions Modeling between Multiple Objects

Extracting compound interactions involving multiple objects is a challenging task in computer vision due to different issues such as the mutual occlusions between objects, the varying group size and issues raised from the tracker. Additionally, the single activities are uncommon compared with the activities that are performed by two or more objects, e.g., gathering, fighting, running, etc. The purpose of this paper is to address the problem of interaction recognition among multiple objects based on dynamic features in an unsupervised manner. Our main contribution is twofold. First, a combined framework using a tracking-by-detection framework for trajectory extraction and HDPs for latent interaction extraction is introduced. Another important contribution is the introduction of a new dataset, the Cavy dataset. The Cavy dataset contains about six dominant interactions performed several times by two or three cavies at different locations. The cavies are interacting in complicated and unexpected ways, which leads to perform many interactions in a short time. This makes working on this dataset more challenging. The experiments in this study are not only performed on the Cavy dataset but we also use the benchmark dataset Behave. The experiments on these datasets demonstrate the effectiveness of the proposed method. Although the our approach is completely unsupervised, we achieved satisfactory results with a clustering accuracy of up to 68.84% on the Behave dataset and up to 45% on the

[1]  Robert B. Fisher,et al.  Non Parametric Classification of Human Interaction , 2007, IbPRIA.

[2]  Jake K. Aggarwal,et al.  Temporal spatio-velocity transform and its application to tracking and interaction , 2004, Comput. Vis. Image Underst..

[3]  Joachim Denzler,et al.  Detection of Object Interactions in Video Sequences , 2015 .

[4]  Binlong Li,et al.  Activity recognition using dynamic subspace angles , 2011, CVPR 2011.

[5]  Seong-Whan Lee,et al.  Group Activity Recognition with Group Interaction Zone , 2014, 2014 22nd International Conference on Pattern Recognition.

[6]  Yunde Jia,et al.  A Hierarchical Model for Human Interaction Recognition , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[7]  Qi Tian,et al.  Recognizing human group action by layered model with multiple cues , 2014, Neurocomputing.

[8]  Guang Yang,et al.  Small group human activity recognition , 2012, 2012 19th IEEE International Conference on Image Processing.

[9]  Radha Poovendran,et al.  Group Event Detection With a Varying Number of Group Members for Video Surveillance , 2010, IEEE Transactions on Circuits and Systems for Video Technology.

[10]  Jun Zhu,et al.  Recognizing Human Group Behaviors with Multi-group Causalities , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[11]  Ian D. Reid,et al.  High Five: Recognising human interactions in TV shows , 2010, BMVC.

[12]  Horst Bischof,et al.  A Duality Based Approach for Realtime TV-L1 Optical Flow , 2007, DAGM-Symposium.

[13]  Bingbing Ni,et al.  Recognizing human group activities with localized causalities , 2009, CVPR 2009.

[14]  Zoran Zivkovic,et al.  Improved adaptive Gaussian mixture model for background subtraction , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[15]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[16]  Luc Van Gool,et al.  What's going on? Discovering spatio-temporal dependencies in dynamic scenes , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Joachim Denzler,et al.  A combination of generative and discriminative models for fast unsupervised activity recognition from traffic scene videos , 2014, IEEE Winter Conference on Applications of Computer Vision.

[18]  Guang Yang,et al.  Human object interactions recognition based on social network analysis , 2013, 2013 IEEE Applied Imagery Pattern Recognition Workshop (AIPR).

[19]  Yunde Jia,et al.  Recognizing human interaction by multiple features , 2011, The First Asian Conference on Pattern Recognition.

[20]  Michael Arens,et al.  Supporting Fuzzy Metric Temporal Logic Based Situation Recognition by Mean Shift Clustering , 2012, KI.

[21]  Robert B. Fisher,et al.  Detection and Classification of Interacting Persons , 2010, Machine Learning for Human Motion Analysis.

[22]  Robert B. Fisher,et al.  The BEHAVE video dataset: ground truthed video for multi-person behavior classification , 2010 .

[23]  Tanaya Guha,et al.  Learning Sparse Representations for Human Action Recognition , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  P. Perona,et al.  utomated multi-day tracking of marked mice for the analysis of ocial behaviour , 2013 .

[25]  Ivan Laptev,et al.  Learning person-object interactions for action recognition in still images , 2011, NIPS.

[26]  Changsheng Xu,et al.  Generative Group Activity Analysis with Quaternion Descriptor , 2011, MMM.

[27]  Bingbing Ni,et al.  Recognizing pair-activities by causality analysis , 2011, TIST.

[28]  Joachim Denzler,et al.  Hierarchical Dirichlet Processes for unsupervised online multi-view action perception using Temporal Self-Similarity features , 2013, 2013 Seventh International Conference on Distributed Smart Cameras (ICDSC).

[29]  Joachim Denzler,et al.  Multi-person Tracking-by-Detection Based on Calibrated Multi-camera Systems , 2012, ICCVG.