Efficient Video Summarization Framework using EEG and Eye-tracking Signals

This paper proposes an efficient video summarization framework that will give a gist of the entire video in a few key-frames or video skims. Existing video summarization frameworks are based on algorithms that utilize computer vision low-level feature extraction or high-level domain level extraction. However, being the ultimate user of the summarized video, humans remain the most neglected aspect. Therefore, the proposed paper considers human’s role in summarization and introduces human visual attention-based summarization techniques. To understand human attention behavior, we have designed and performed experiments with human participants using electroencephalogram (EEG) and eye-tracking technology. The EEG and eye-tracking data obtained from the experimentation are processed simultaneously and used to segment frames containing useful information from a considerable video volume. Thus, the frame segmentation primarily relies on the cognitive judgments of human beings. Using our approach, a video is summarized by ∼96.5% while maintaining higher precision (∼0.98) and high recall factors (∼0.97). The comparison with the stateof-the-art techniques demonstrates that the proposed approach yields ceiling-level performance with reduced computational cost in summarising the videos.

[1]  Robert Laganière,et al.  Video summarization of surveillance cameras , 2016, 2016 13th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[2]  Xuelong Li,et al.  Video Summarization With Attention-Based Encoder–Decoder Networks , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Harry W. Agius,et al.  ELVIS: Entertainment-led video summaries , 2010, ACM Trans. Multim. Comput. Commun. Appl..

[4]  Soraia M. Alarcão,et al.  Emotions Recognition Using EEG Signals: A Survey , 2019, IEEE Transactions on Affective Computing.

[5]  Yong Ho Kim,et al.  Toward a conceptual framework of key‐frame extraction and storyboard display for video summarization , 2010, J. Assoc. Inf. Sci. Technol..

[6]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[7]  Qiang Ji,et al.  Hybrid video emotional tagging using users’ EEG and video content , 2014, Multimedia Tools and Applications.

[8]  Manoranjan Paul,et al.  Affective Video Events Summarization Using EMD Decomposed EEG Signals (EDES) , 2017, 2017 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[9]  IMAG-LIFIA,et al.  Comparison of Correlation Techniques , 2004 .

[10]  Alan Hanjalic,et al.  A New Method for Key Frame Based Video Content Representation , 1998, Image Databases and Multi-Media Search.

[11]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[12]  Yelena Yesha,et al.  Keyframe-based video summarization using Delaunay clustering , 2006, International Journal on Digital Libraries.

[13]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[14]  Marco Pellegrini,et al.  STIMO: STIll and MOving video storyboard for the web scenario , 2009, Multimedia Tools and Applications.

[15]  Santanu Chaudhury,et al.  Intelligent Identification of Ornamental Devanagari Characters Inspired by Visual Fixations , 2019, 2019 International Conference on Document Analysis and Recognition Workshops (ICDARW).

[16]  Santanu Chaudhury,et al.  Memorability-based image compression , 2019, IET Image Process..

[17]  Sung Wook Baik,et al.  Audio-Visual and EEG-Based Attention Modeling for Extraction of Affective Video Content , 2015, 2015 International Conference on Platform Technology and Service.

[18]  Sung Wook Baik,et al.  Adaptive key frame extraction for video summarization using an aggregation mechanism , 2012, J. Vis. Commun. Image Represent..

[19]  Joseph H. Goldberg,et al.  Identifying fixations and saccades in eye-tracking protocols , 2000, ETRA.

[20]  Huiyu Zhou,et al.  Feature extraction and clustering for dynamic video summarisation , 2010, Neurocomputing.

[21]  Santanu Chaudhury,et al.  Unlocking the Mechanism of Devanagari Letter Identification Using Eye Tracking , 2017, PReMI.

[22]  Vince D. Calhoun,et al.  A Realistic Framework for Investigating Decision Making in the Brain With High Spatiotemporal Resolution Using Simultaneous EEG/fMRI and Joint ICA , 2017, IEEE Journal of Biomedical and Health Informatics.

[23]  Jurandy Almeida,et al.  VISON: VIdeo Summarization for ONline applications , 2012, Pattern Recognit. Lett..

[24]  Sung Wook Baik,et al.  Divide-and-conquer based summarization framework for extracting affective video content , 2016, Neurocomputing.

[25]  K. Rayner Eye movements in reading and information processing: 20 years of research. , 1998, Psychological bulletin.

[26]  Boyang Li,et al.  Heterogeneous Knowledge Transfer in Video Emotion Recognition, Attribution and Summarization , 2015, IEEE Transactions on Affective Computing.

[27]  Debi Prosad Dogra,et al.  Summarization of videos by analyzing affective state of the user through crowdsource , 2018, Cognitive Systems Research.

[28]  Xin Liu,et al.  Video summarization with minimal visual content redundancies , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[29]  Ssanghee Seo,et al.  EEG Analysis of Frontal Lobe Activities by Decision Stimuli , 2008, 2008 Second International Conference on Future Generation Communication and Networking.

[30]  J. Gray,et al.  PsychoPy2: Experiments in behavior made easy , 2019, Behavior Research Methods.

[31]  Sung Wook Baik,et al.  Efficient CNN based summarization of surveillance videos for resource-constrained devices , 2020, Pattern Recognit. Lett..

[32]  Ling Shao,et al.  Video abstraction based on fMRI-driven visual attention model , 2014, Inf. Sci..

[33]  Mahmood Jasim,et al.  Unsupervised video summarization framework using keyframe extraction and video skimming , 2020, 2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA).

[34]  Robert Miller,et al.  Theory of the normal waking EEG: from single neurones to waveforms in the alpha, beta and gamma frequency ranges. , 2007, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[35]  Amit K. Roy-Chowdhury,et al.  Context-Aware Surveillance Video Summarization , 2016, IEEE Transactions on Image Processing.

[36]  Vineeth N. Balasubramanian,et al.  Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[37]  Arun Kumar Sangaiah,et al.  Fog computing enabled cost-effective distributed summarization of surveillance videos for smart cities , 2019, J. Parallel Distributed Comput..

[38]  Nicu Sebe,et al.  Exploiting facial expressions for affective video summarisation , 2009, CIVR '09.

[39]  Matthieu Cord,et al.  VSUMM: An Approach for Automatic Video Summarization and Quantitative Evaluation , 2008, 2008 XXI Brazilian Symposium on Computer Graphics and Image Processing.

[40]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[41]  R. Homan,et al.  Cerebral location of international 10-20 system electrode placement. , 1987, Electroencephalography and clinical neurophysiology.

[42]  Yong Ho Kim,et al.  Video summarization using event‐related potential responses to shot boundaries in real‐time video watching , 2018, J. Assoc. Inf. Sci. Technol..

[43]  Harish Katti,et al.  Affective Video Summarization and Story Board Generation Using Pupillary Dilation and Eye Gaze , 2011, 2011 IEEE International Symposium on Multimedia.