SR-clustering: Semantic regularized clustering for egocentric photo streams segmentation

While wearable cameras are becoming increasingly popular, locating relevant information in large unstructured collections of egocentric images is still a tedious and time consuming process. This paper addresses the problem of organizing egocentric photo streams acquired by a wearable camera into semantically meaningful segments, hence making an important step towards the goal of automatically annotating these photos for browsing and retrieval. In the proposed method, first, contextual and semantic information is extracted for each image by employing a Convolutional Neural Networks approach. Later, a vocabulary of concepts is defined in a semantic space by relying on linguistic information. Finally, by exploiting the temporal coherence of concepts in photo streams, images which share contextual and semantic attributes are grouped together. The resulting temporal segmentation is particularly suited for further analysis, ranging from event recognition to semantic indexing and summarization. Experimental results over egocentric set of nearly 31,000 images, show the prominence of the proposed approach over state-of-the-art methods.

[1]  Petia Radeva,et al.  R-Clustering for Egocentric Video Segmentation , 2015, IbPRIA.

[2]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[3]  Alan F. Smeaton,et al.  Automatically Segmenting LifeLog Data into Events , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[4]  Jordi Vitrià,et al.  Intestinal event segmentation for endoluminal video analysis , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[5]  Zhen Li,et al.  Daily life event segmentation for lifestyle evaluation based on multi-sensor data recorded by a wearable device , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[6]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[7]  Shmuel Peleg,et al.  Compact CNN for indexing egocentric videos , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[8]  Petia Radeva,et al.  Visual summary of egocentric photostreams by representative keyframes , 2015, 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[9]  Wei-Hao Lin,et al.  Structuring continuous video recordings of everyday life using time-constrained clustering , 2006, Electronic Imaging.

[10]  Nebojsa Jojic,et al.  Structural epitome: a way to summarize one's visual experience , 2010, NIPS.

[11]  S. Marshall,et al.  An ethical framework for automated, wearable cameras in health behavior research. , 2013, American journal of preventive medicine.

[12]  Petia Radeva,et al.  Toward Storytelling From Visual Lifelogging: An Overview , 2015, IEEE Transactions on Human-Machine Systems.

[13]  Petia Radeva,et al.  Towards social interaction detection in egocentric photo-streams , 2015, International Conference on Machine Vision.

[14]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[15]  Trevor Darrell,et al.  LSDA: Large Scale Detection through Adaptation , 2014, NIPS.

[16]  Qi Tian,et al.  Seeing the Big Picture: Deep Embedding with Contextual Evidences , 2014, ArXiv.

[17]  Cees Snoek,et al.  Recommendations for recognizing video events by concept vocabularies , 2014, Comput. Vis. Image Underst..

[18]  Steve E Hodges,et al.  Wearable cameras in health: the state of the art and future possibilities. , 2013, American journal of preventive medicine.

[19]  Shmuel Peleg,et al.  Temporal Segmentation of Egocentric Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Petia Radeva,et al.  Multi-face tracking by extended bag-of-tracklets in egocentric photo-streams , 2015, Comput. Vis. Image Underst..

[21]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[22]  Vipin Kumar,et al.  Introduction to Data Mining , 2022, Data Mining and Machine Learning Applications.

[23]  Katherine Davies,et al.  Visual Ethics: Ethical Issues in Visual Research , 2008 .

[24]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[25]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[26]  Petia Radeva,et al.  Multi-Face Tracking by Extended Bag-of-Tracklets in Egocentric Videos , 2015, ArXiv.

[27]  Yong Jae Lee,et al.  Predicting Important Objects for Egocentric Video Summarization , 2015, International Journal of Computer Vision.

[28]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[29]  Petia Radeva,et al.  Video Segmentation of Life-Logging Videos , 2014, AMDO.

[30]  Petia Radeva,et al.  With whom do I interact? Detecting social interactions in egocentric photo-streams , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[33]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[34]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Olga Veksler,et al.  Fast approximate energy minimization via graph cuts , 2001, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[36]  Gregory D. Abowd,et al.  Predicting daily activities from egocentric images using deep learning , 2015, SEMWEB.