With whom do I interact? Detecting social interactions in egocentric photo-streams

Given a user wearing a low frame rate wearable camera during a day, this work aims to automatically detect the moments when the user gets engaged into a social interaction solely by reviewing the automatically captured photos by the worn camera. The proposed method, inspired by the sociological concept of F-formation, exploits distance and orientation of the appearing individuals -with respect to the user- in the scene from a bird-view perspective. As a result, the interaction pattern over the sequence can be understood as a two-dimensional time series that corresponds to the temporal evolution of the distance and orientation features over time. A Long-Short Term Memory-based Recurrent Neural Network is then trained to classify each time series. Experimental evaluation over a dataset of 30.000 images has shown promising results on the proposed method for social interaction detection in egocentric photo-streams.

[1]  Kuo C. Jay Video Content Analysis Using Multimodal Information: For Movie Content Extraction, Indexing and Representation , 2003 .

[2]  Li-Jia Li,et al.  Multi-view Face Detection Using Deep Convolutional Neural Networks , 2015, ICMR.

[3]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[4]  Christopher Joseph Pal,et al.  Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Alessio Del Bue,et al.  Human behavior analysis in video surveillance: A Social Signal Processing perspective , 2013, Neurocomputing.

[6]  A. Kendon Conducting Interaction: Patterns of Behavior in Focused Encounters , 1990 .

[7]  Petia Radeva,et al.  Multi-face tracking by extended bag-of-tracklets in egocentric photo-streams , 2015, Comput. Vis. Image Underst..

[8]  Alessio Del Bue,et al.  Social interaction discovery by statistical analysis of F-formations , 2011, BMVC.

[9]  Maja Pantic,et al.  Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[10]  Henry S. Baird,et al.  Using synthetic data safely in classification , 2009, Electronic Imaging.

[11]  Kevin Leyton-Brown,et al.  An Efficient Approach for Assessing Hyperparameter Importance , 2014, ICML.

[12]  Larry H. Matthies,et al.  First-Person Activity Recognition: What Are They Doing to Me? , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Subramanian Ramanathan,et al.  SALSA: A Novel Dataset for Multimodal Group Behavior Analysis , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Petia Radeva,et al.  R-Clustering for Egocentric Video Segmentation , 2015, IbPRIA.

[15]  E. Goffman The Presentation of Self in Everyday Life , 1959 .

[16]  Christian Wolf,et al.  Action Classification in Soccer Videos with Long Short-Term Memory Recurrent Neural Networks , 2010, ICANN.

[17]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[18]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Daniel Gatica-Perez,et al.  Automatic nonverbal analysis of social interaction in small groups: A review , 2009, Image Vis. Comput..

[20]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[21]  D. Umberson,et al.  Social Relationships and Health: A Flashpoint for Health Policy , 2010, Journal of health and social behavior.

[22]  Petia Radeva,et al.  Towards social interaction detection in egocentric photo-streams , 2015, International Conference on Machine Vision.

[23]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  James M. Rehg,et al.  Social interactions: A first-person perspective , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Kevin Leyton-Brown,et al.  Efficient Benchmarking of Hyperparameter Optimizers via Surrogates , 2015, AAAI.

[26]  Francesco Solera,et al.  From Ego to Nos-Vision: Detecting Social Relationships in First-Person Views , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[27]  Jun Zhu,et al.  Revisit Long Short-Term Memory : An Optimization Perspective , 2014 .

[28]  Patrick Bouthemy,et al.  Extraction of Semantic Dynamic Content from Videos with Probabilistic Motion Models , 2004, ECCV.