Labeling subtle conversational interactions within the CONVERSE dataset

The field of Human Action Recognition has expanded greatly in previous years, exploring actions and interactions between individuals via the use of appearance and depth based pose information. There are numerous datasets that display action classes composed of behaviors that are well defined by their key poses, such as ‘kicking’ and ‘punching’. The CONVERSE dataset presents conversational interaction classes that show little explicit relation to the poses and gestures they exhibit. Such a complex and subtle set of interactions is a novel challenge to the Human Action Recognition community, and one that will push the cutting edge of the field in both machine learning and the understanding of human actions. CONVERSE contains recordings of two person interactions from 7 conversational scenarios, represented as sequences of human skeletal poses captured by the Kinect depth sensor. In this study we discuss a method providing ground truth labelling for the set, and the complexity that comes with defining such annotation. The CONVERSE dataset it made available online.

[1]  Xianghua Xie,et al.  From pose to activity: Surveying datasets and introducing CONVERSE , 2015, Comput. Vis. Image Underst..

[2]  François Brémond,et al.  ETISEO, performance evaluation for video surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[3]  Dimitris Samaras,et al.  Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[4]  Barbara Caputo,et al.  Recognizing human actions: a local SVM approach , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[5]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[6]  Jiebo Luo,et al.  Recognizing realistic actions from videos “in the wild” , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Ruzena Bajcsy,et al.  Berkeley MHAD: A comprehensive Multimodal Human Action Database , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[8]  Mubarak Shah,et al.  A 3-dimensional sift descriptor and its application to action recognition , 2007, ACM Multimedia.

[9]  Javed Imran,et al.  Human action recognition using RGB-D sensor and deep convolutional neural networks , 2016, 2016 International Conference on Advances in Computing, Communications and Informatics (ICACCI).

[10]  Christopher Joseph Pal,et al.  Activity recognition using the velocity histories of tracked keypoints , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[11]  Moritz Tenorth,et al.  The TUM Kitchen Data Set of everyday manipulation activities for motion tracking and action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[12]  Cordelia Schmid,et al.  Learning realistic human actions from movies , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Yunde Jia,et al.  Learning Human Interaction by Interactive Phrases , 2012, ECCV.

[14]  Wei Guo,et al.  Efficient Interaction Recognition through Positive Action Representation , 2013 .

[15]  Ying Wu,et al.  Mining actionlet ensemble for action recognition with depth cameras , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Xuelong Li,et al.  View-Independent Behavior Analysis , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[17]  Xianghua Xie,et al.  A bag of words approach to subject specific 3D human pose interaction classification with random decision forests , 2014, Graph. Model..

[18]  Ivan Laptev,et al.  On Space-Time Interest Points , 2005, International Journal of Computer Vision.

[19]  Ronen Basri,et al.  Actions as Space-Time Shapes , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Ronen Basri,et al.  Actions as space-time shapes , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[21]  Tim J. Ellis,et al.  ViHASi: Virtual human action silhouette data for the performance evaluation of silhouette-based action recognition methods , 2008, ICDSC.