论文信息 - Fast forwarding Egocentric Videos by Listening and Watching

Fast forwarding Egocentric Videos by Listening and Watching

The remarkable technological advance in well-equipped wearable devices is pushing an increasing production of long first-person videos. However, since most of these videos have long and tedious parts, they are forgotten or never seen. Despite a large number of techniques proposed to fast-forward these videos by highlighting relevant moments, most of them are image based only. Most of these techniques disregard other relevant sensors present in the current devices such as high-definition microphones. In this work, we propose a new approach to fast-forward videos using psychoacoustic metrics extracted from the soundtrack. These metrics can be used to estimate the annoyance of a segment allowing our method to emphasize moments of sound pleasantness. The efficiency of our method is demonstrated through qualitative results and quantitative results as far as of speed-up and instability are concerned.

[1] Alexandra Branzan Albu,et al. Video summarization for remote invigilation of online exams , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[2] Kristen Grauman,et al. Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Ming-Hsuan Yang,et al. Semantic-Driven Generation of Hyperlapse from 360 Degree Video , 2018, IEEE Transactions on Visualization and Computer Graphics.

[4] Mario F. M. Campos,et al. A Weighted Sparse Sampling and Smoothing Frame Transition Approach for Semantic Fast-Forward First-Person Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5] Andrew Zisserman,et al. Look, Listen and Learn , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6] Mario F. M. Campos,et al. Making a long story short: A multi-importance fast-forwarding egocentric videos with the emphasis on relevant objects , 2018, J. Vis. Commun. Image Represent..

[7] Andrew Owens,et al. Audio-Visual Scene Analysis with Self-Supervised Multisensory Features , 2018, ECCV.

[8] Ming-Hsuan Yang,et al. Semantic-driven Generation of Hyperlapse from 360° Video , 2017, ArXiv.

[9] Chia-Yen Chen,et al. Video summarization based on face recognition and speaker verification , 2015, 2015 IEEE 10th Conference on Industrial Electronics and Applications (ICIEA).

[10] Mario Fernando Montenegro Campos,et al. Fast-forward video based on semantic extraction , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[11] Mario Fernando Montenegro Campos,et al. Towards Semantic Fast-Forward and Stabilized Egocentric Videos , 2016, ECCV Workshops.

[12] Tao Mei,et al. Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).