Personalized Egocentric Video Summarization of Cultural Tour on User Preferences Input

In this paper, we propose a new method for customized summarization of egocentric videos according to specific user preferences, so that different users can extract different summaries from the same stream. Our approach, tailored on a cultural heritage scenario, relies on creating a short synopsis of the original video focused on key shots, in which concepts relevant to user preferences can be visually detected and the chronological flow of the original video is preserved. Moreover, we release a new dataset, composed of egocentric streams taken in uncontrolled scenarios, capturing tourists cultural visits in six art cities, with geolocalization information. Our experimental results show that the proposed approach is able to leverage user's preferences with an accent on storyline chronological flow and on visual smoothness.

[1]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[2]  Ulrik Brandes,et al.  Centrality Measures Based on Current Flow , 2005, STACS.

[3]  B. Li,et al.  Event detection and summarization in sports video , 2001, Proceedings IEEE Workshop on Content-Based Access of Image and Video Libraries (CBAIVL 2001).

[4]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[5]  C. V. Jawahar,et al.  Efficient object annotation for surveillance and automotive applications , 2016, 2016 IEEE Winter Applications of Computer Vision Workshops (WACVW).

[6]  Petia Radeva,et al.  Semantic Summarization of Egocentric Photo Stream Events , 2015, LTA@MM.

[7]  Amir Alexander,et al.  Two cultures: Essays in honour of David Speiser , 2008 .

[8]  Song-Chun Zhu,et al.  Joint inference of groups, events and human roles in aerial videos , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Vlad I. Morariu,et al.  Summarizing While Recording: Context-Based Highlight Detection for Egocentric Videos , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[10]  Amy Nicole Langville,et al.  Google's PageRank and beyond - the science of search engine rankings , 2006 .

[11]  Gene H. Golub,et al.  Extrapolation methods for accelerating PageRank computations , 2003, WWW '03.

[12]  Yuxin Peng,et al.  Clip-based similarity measure for query-dependent clip retrieval and video summarization , 2006, IEEE Trans. Circuits Syst. Video Technol..

[13]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[14]  Aric Hagberg,et al.  Exploring Network Structure, Dynamics, and Function using NetworkX , 2008 .

[15]  Christophe De Vleeschouwer,et al.  Formulating Team-Sport Video Summarization as a Resource Allocation Problem , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Derek Greene,et al.  Unsupervised graph-based topic labelling using dbpedia , 2013, WSDM.

[17]  Jens Lehmann,et al.  DBpedia: A Nucleus for a Web of Open Data , 2007, ISWC/ASWC.

[18]  Rita Cucchiara,et al.  Personalized Egocentric Video Summarization for Cultural Experience , 2015, ICMR.

[19]  Gunnar Farnebäck,et al.  Two-Frame Motion Estimation Based on Polynomial Expansion , 2003, SCIA.

[20]  Tao Mei,et al.  Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Ortrud R. Oellermann,et al.  The average connectivity of a graph , 2002, Discret. Math..

[22]  Mubarak Shah,et al.  Query-Focused Extractive Video Summarization , 2016, ECCV.

[23]  Shmuel Peleg,et al.  Compact CNN for indexing egocentric videos , 2015, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[24]  Ke Zhang,et al.  Summary Transfer: Exemplar-Based Subset Selection for Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yuting Su,et al.  Surveillance video summarization based on moving object detection and trajectory extraction , 2010, 2010 2nd International Conference on Signal Processing Systems.

[26]  M E J Newman,et al.  Finding and evaluating community structure in networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[27]  Joo-Hwee Lim,et al.  Summarization of Egocentric Videos: A Comprehensive Survey , 2017, IEEE Transactions on Human-Machine Systems.

[28]  James Zijun Wang,et al.  RAPID: Rating Pictorial Aesthetics using Deep Learning , 2014, ACM Multimedia.

[29]  Ali Farhadi,et al.  Salient Montages from Unconstrained Videos , 2014, ECCV.

[30]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Regunathan Radhakrishnan,et al.  A Unified Framework for Video Summarization, Browsing & Retrieval: with Applications to Consumer and Surveillance Video , 2005 .

[32]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Bin Zhao,et al.  Quasi Real-Time Summarization for Consumer Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[36]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[37]  Patricia Ladret,et al.  The blur effect: perception and estimation with a new no-reference perceptual blur metric , 2007, Electronic Imaging.

[38]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[39]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[40]  Meng Wang,et al.  Event Driven Web Video Summarization by Tag Localization and Key-Shot Identification , 2012, IEEE Transactions on Multimedia.

[41]  Petia Radeva,et al.  Toward Storytelling From Visual Lifelogging: An Overview , 2015, IEEE Transactions on Human-Machine Systems.

[42]  Keiji Yanai,et al.  Summarization of Egocentric Moving Videos for Generating Walking Route Guidance , 2013, PSIVT.

[43]  Regunathan Radhakrishnan,et al.  A Unified Framework for Video Summarization, Browsing, and Retrieval , 2006 .

[44]  Michael F. Cohen,et al.  Real-time hyperlapse creation via optimal frame selection , 2015, ACM Trans. Graph..

[45]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[46]  J. Henderson Regarding Scenes , 2007 .

[47]  Shmuel Peleg,et al.  EgoSampling: Fast-forward and stereo for egocentric videos , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[49]  Baoxin Li,et al.  A general framework for sports video summarization with its application to soccer , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[50]  James M. Rehg,et al.  Gaze-enabled egocentric video summarization via constrained submodular maximization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Kiyoharu Aizawa,et al.  Summarization of wearable videos using support vector machine , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.