Understanding social relationships in egocentric vision

The understanding of mutual people interaction is a key component for recognizing people social behavior, but it strongly relies on a personal point of view resulting difficult to be a-priori modeled. We propose the adoption of the unique head mounted cameras first person perspective (ego-vision) to promptly detect people interaction in different social contexts. The proposal relies on a complete and reliable system that extracts people's head pose combining landmarks and shape descriptors in a temporal smoothed HMM framework. Finally, interactions are detected through supervised clustering on mutual head orientation and people distances exploiting a structural learning framework that specifically adjusts the clustering measure according to a peculiar scenario. Our solution provides the flexibility to capture the interactions disregarding the number of individuals involved and their level of acquaintance in context with a variable degree of social involvement. The proposed system shows competitive performances on both publicly available ego-vision datasets and ad hoc benchmarks built with real life situations. HighlightsA head pose estimation method designed to work in ego-vision scenarios is provided.We define a 3D people localization method that works without any camera calibration.We estimate social groups with supervised correlation clustering and structural SVM.A tracking state-of-the-art evaluation applied to first person videos is provided.

[1]  Ben J. A. Kröse,et al.  Detecting F-formations as dominant sets , 2011, ICMI '11.

[2]  Claire Cardie,et al.  Noun Phrase Coreference as Clustering , 1999, EMNLP.

[3]  A. Kendon Studies in the behavior of social interaction , 1977 .

[4]  Bingpeng Ma,et al.  Robust Head Pose Estimation Using LGBP , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[5]  Harris Papadopoulos,et al.  Reliable Probability Estimates Based on Support Vector Machines for Large Multiclass Datasets , 2012, AIAI.

[6]  Thorsten Joachims,et al.  Supervised clustering with support vector machines , 2005, ICML.

[7]  Simone Calderara,et al.  Visual Tracking: An Experimental Survey , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Nicoletta Noceti,et al.  Humans in groups: The importance of contextual information for understanding collective activities , 2014, Pattern Recognit..

[9]  Lynette Hirschman,et al.  A Model-Theoretic Coreference Scoring Scheme , 1995, MUC.

[10]  Zdenek Kalal,et al.  Tracking-Learning-Detection , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Ioannis A. Kakadiaris,et al.  Modeling local behavior for predicting social interactions towards human tracking , 2014, Pattern Recognit..

[12]  Ehud Rivlin,et al.  Robust Fragments-based Tracking using the Integral Histogram , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Zhe L. Lin,et al.  Nonparametric Context Modeling of Local Appearance for Pose- and Expression-Robust Facial Landmark Localization , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[15]  Luc Van Gool,et al.  Real-time facial feature detection using conditional regression forests , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[18]  Vibhav Vineet,et al.  Struck: Structured Output Tracking with Kernels , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Mohan M. Trivedi,et al.  Robust real-time detection, tracking, and pose estimation of faces in video streams , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[20]  Matthias Rauterberg,et al.  An Overview of First Person Vision and Egocentric Video Analysis for Personal Mobile Wearable Devices , 2014 .

[21]  Cheng Li,et al.  Pixel-Level Hand Detection in Ego-centric Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Alessio Del Bue,et al.  Social interaction discovery by statistical analysis of F-formations , 2011, BMVC.

[23]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[26]  Witold Pedrycz,et al.  A central profile-based 3D face pose estimation , 2014, Pattern Recognit..

[27]  Alexander Zelinsky,et al.  Real-time stereo tracking for head pose and gaze estimation , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[28]  M. Trivedi,et al.  A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis , 2004 .

[29]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[30]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Ehud Rivlin,et al.  Robust 3D Head Tracking Using Camera Pose Estimation , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[32]  Shaogang Gong,et al.  Head Pose Classification in Crowded Scenes , 2009, BMVC.

[33]  Horst Bischof,et al.  Hough-based tracking of non-rigid objects , 2011, 2011 International Conference on Computer Vision.

[34]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[35]  Avrim Blum,et al.  Correlation Clustering , 2004, Machine Learning.

[36]  James M. Rehg,et al.  Social interactions: A first-person perspective , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[38]  Francesco Solera,et al.  From Ego to Nos-Vision: Detecting Social Relationships in First-Person Views , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[39]  Anthony Wirth,et al.  Correlation Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.