论文信息 - With whom do I interact? Detecting social interactions in egocentric photo-streams

With whom do I interact? Detecting social interactions in egocentric photo-streams

Given a user wearing a low frame rate wearable camera during a day, this work aims to automatically detect the moments when the user gets engaged into a social interaction solely by reviewing the automatically captured photos by the worn camera. The proposed method, inspired by the sociological concept of F-formation, exploits distance and orientation of the appearing individuals -with respect to the user- in the scene from a bird-view perspective. As a result, the interaction pattern over the sequence can be understood as a two-dimensional time series that corresponds to the temporal evolution of the distance and orientation features over time. A Long-Short Term Memory-based Recurrent Neural Network is then trained to classify each time series. Experimental evaluation over a dataset of 30.000 images has shown promising results on the proposed method for social interaction detection in egocentric photo-streams.

[1] Kuo C. Jay. Video Content Analysis Using Multimodal Information: For Movie Content Extraction, Indexing and Representation , 2003 .

[2] Li-Jia Li,et al. Multi-view Face Detection Using Deep Convolutional Neural Networks , 2015, ICMR.

[3] Quoc V. Le,et al. On optimization methods for deep learning , 2011, ICML.

[4] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5] Alessio Del Bue,et al. Human behavior analysis in video surveillance: A Social Signal Processing perspective , 2013, Neurocomputing.

[6] A. Kendon. Conducting Interaction: Patterns of Behavior in Focused Encounters , 1990 .

[7] Petia Radeva,et al. Multi-face tracking by extended bag-of-tracklets in egocentric photo-streams , 2015, Comput. Vis. Image Underst..

[8] Alessio Del Bue,et al. Social interaction discovery by statistical analysis of F-formations , 2011, BMVC.

[9] Maja Pantic,et al. Social signal processing: Survey of an emerging domain , 2009, Image Vis. Comput..

[10] Henry S. Baird,et al. Using synthetic data safely in classification , 2009, Electronic Imaging.

[11] Kevin Leyton-Brown,et al. An Efficient Approach for Assessing Hyperparameter Importance , 2014, ICML.

[12] Larry H. Matthies,et al. First-Person Activity Recognition: What Are They Doing to Me? , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Subramanian Ramanathan,et al. SALSA: A Novel Dataset for Multimodal Group Behavior Analysis , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Petia Radeva,et al. R-Clustering for Egocentric Video Segmentation , 2015, IbPRIA.

[15] E. Goffman. The Presentation of Self in Everyday Life , 1959 .

[16] Christian Wolf,et al. Action Classification in Soccer Videos with Long Short-Term Memory Recurrent Neural Networks , 2010, ICANN.

[17] Jürgen Schmidhuber,et al. Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[18] Deva Ramanan,et al. Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Daniel Gatica-Perez,et al. Automatic nonverbal analysis of social interaction in small groups: A review , 2009, Image Vis. Comput..

[20] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[21] D. Umberson,et al. Social Relationships and Health: A Flashpoint for Health Policy , 2010, Journal of health and social behavior.

[22] Petia Radeva,et al. Towards social interaction detection in egocentric photo-streams , 2015, International Conference on Machine Vision.

[23] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] James M. Rehg,et al. Social interactions: A first-person perspective , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Kevin Leyton-Brown,et al. Efficient Benchmarking of Hyperparameter Optimizers via Surrogates , 2015, AAAI.

[26] Francesco Solera,et al. From Ego to Nos-Vision: Detecting Social Relationships in First-Person Views , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[27] Jun Zhu,et al. Revisit Long Short-Term Memory : An Optimization Perspective , 2014 .

[28] Patrick Bouthemy,et al. Extraction of Semantic Dynamic Content from Videos with Probabilistic Motion Models , 2004, ECCV.