Fast forwarding Egocentric Videos by Listening and Watching

The remarkable technological advance in well-equipped wearable devices is pushing an increasing production of long first-person videos. However, since most of these videos have long and tedious parts, they are forgotten or never seen. Despite a large number of techniques proposed to fast-forward these videos by highlighting relevant moments, most of them are image based only. Most of these techniques disregard other relevant sensors present in the current devices such as high-definition microphones. In this work, we propose a new approach to fast-forward videos using psychoacoustic metrics extracted from the soundtrack. These metrics can be used to estimate the annoyance of a segment allowing our method to emphasize moments of sound pleasantness. The efficiency of our method is demonstrated through qualitative results and quantitative results as far as of speed-up and instability are concerned.

[1]  Alexandra Branzan Albu,et al.  Video summarization for remote invigilation of online exams , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[2]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Ming-Hsuan Yang,et al.  Semantic-Driven Generation of Hyperlapse from 360 Degree Video , 2018, IEEE Transactions on Visualization and Computer Graphics.

[4]  Mario F. M. Campos,et al.  A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Andrew Zisserman,et al.  Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Mario F. M. Campos,et al.  Making a long story short: A multi-importance fast-forwarding egocentric videos with the emphasis on relevant objects , 2018, J. Vis. Commun. Image Represent..

[7]  Andrew Owens,et al.  Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.

[8]  Ming-Hsuan Yang,et al.  Semantic-driven Generation of Hyperlapse from 360° Video , 2017, ArXiv.

[9]  Chia-Yen Chen,et al.  Video summarization based on face recognition and speaker verification , 2015, 2015 IEEE 10th Conference on Industrial Electronics and Applications (ICIEA).

[10]  Mario Fernando Montenegro Campos,et al.  Fast-forward video based on semantic extraction , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[11]  Mario Fernando Montenegro Campos,et al.  Towards Semantic Fast-Forward and Stabilized Egocentric Videos , 2016, ECCV Workshops.

[12]  Tao Mei,et al.  Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).