Divide-and-conquer based summarization framework for extracting affective video content

Recent advances in multimedia technology have led to tremendous increases in the available volume of video data, thereby creating a major requirement for efficient systems to manage such huge data volumes. Video summarization is one of the key techniques for accessing and managing large video libraries. Video summarization can be used to extract the affective contents of a video sequence to generate a concise representation of its content. Human attention models are an efficient means of affective content extraction. Existing visual attention driven summarization frameworks have high computational cost and memory requirements, as well as a lack of efficiency in accurately perceiving human attention. To cope with these issues, we propose a divide-and-conquer based framework for an efficient summarization of big video data. We divide the original video data into shots, where an attention model is computed from each shot in parallel. Viewer's attention is based on multiple sensory perceptions, i.e., aural and visual, as well as the viewer's neuronal signals. The aural attention model is based on the Teager energy, instant amplitude, and instant frequency, whereas the visual attention model employs multi-scale contrast and motion intensity. Moreover, the neuronal attention is computed using the beta-band frequencies of neuronal signals. Next, an aggregated attention curve is generated using an intra- and inter-modality fusion mechanism. Finally, the affective content in each video shot is extracted. The fusion of multimedia and neuronal signals provides a bridge that links the digital representation of multimedia with the viewer's perceptions. Our experimental results indicate that the proposed shot-detection based divide-and-conquer strategy mitigates the time and computational complexity. Moreover, the proposed attention model provides an accurate reflection of the user preferences and facilitates the extraction of highly affective and personalized summaries.

[1]  Atreyi Kankanhalli,et al.  Automatic partitioning of full-motion video , 1993, Multimedia Systems.

[2]  Paul Over,et al.  The trecvid 2007 BBC rushes summarization evaluation pilot , 2007, TVS '07.

[3]  Petros Maragos,et al.  Audiovisual Attention Modeling and Salient Event Detection , 2008, Multimodal Processing and Interaction.

[4]  Jerry D. Gibson,et al.  Handbook of Image and Video Processing , 2000 .

[5]  J. Crowley,et al.  Experimental Comparison of Correlation Techniques , 2007 .

[6]  Aneta Brzezicka,et al.  β band oscillations as a correlate of alertness--changes in aging. , 2012, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[7]  Ioannis Pitas,et al.  Content-based video parsing and indexing based on audio-visual interaction , 2001, IEEE Trans. Circuits Syst. Video Technol..

[8]  Bao-Liang Lu,et al.  Emotional state classification from EEG data using machine learning approach , 2014, Neurocomputing.

[9]  Ernst Fernando Lopes Da Silva Niedermeyer,et al.  Electroencephalography, basic principles, clinical applications, and related fields , 1982 .

[10]  Sung Wook Baik,et al.  Video summarization using a network of radial basis functions , 2012, Multimedia Systems.

[11]  A. Wróbel,et al.  EEG beta band activity is related to attention and attentional deficits in the visual performance of elderly subjects. , 2013, International journal of psychophysiology : official journal of the International Organization of Psychophysiology.

[12]  Mahmoud Hassan,et al.  EEG Source Connectivity Analysis: From Dense Array Recordings to Brain Networks , 2014, PloS one.

[13]  Sung Wook Baik,et al.  Prioritization of brain MRI volumes using medical image perception model and tumor region segmentation , 2013, Comput. Biol. Medicine.

[14]  Sung Wook Baik,et al.  Video summarization based tele-endoscopy: a service to efficiently manage visual data generated during wireless capsule endoscopy procedure , 2014, Journal of Medical Systems.

[15]  Hongxun Yao,et al.  Video classification and recommendation based on affective analysis of viewers , 2013, Neurocomputing.

[16]  Sung Wook Baik,et al.  Audio-Visual and EEG-Based Attention Modeling for Extraction of Affective Video Content , 2015, 2015 International Conference on Platform Technology and Service.

[17]  Huiyu Zhou,et al.  Feature extraction and clustering for dynamic video summarisation , 2010, Neurocomputing.

[18]  Harry W. Agius,et al.  ELVIS: Entertainment-led video summaries , 2010, ACM Trans. Multim. Comput. Commun. Appl..

[19]  Saeid Sanei,et al.  EEG signal processing , 2000, Clinical Neurophysiology.

[20]  Sung Wook Baik,et al.  Saliency-directed prioritization of visual data in wireless surveillance networks , 2015, Inf. Fusion.

[21]  Mohammad Ali Badamchizadeh,et al.  Artifacts removal in EEG signal using a new neural network enhanced adaptive filter , 2013, Neurocomputing.

[22]  Seungmin Rho,et al.  TrendsSummary: a platform for retrieving and summarizing trendy multimedia contents , 2013, Multimedia Tools and Applications.

[23]  Jiang Peng,et al.  Keyframe-Based Video Summary Using Visual Attention Clues , 2010 .

[24]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[25]  Petros Maragos,et al.  Energy separation in signal modulations with application to speech analysis , 1993, IEEE Trans. Signal Process..

[26]  Jurandy Almeida,et al.  VISON: VIdeo Summarization for ONline applications , 2012, Pattern Recognit. Lett..

[27]  Petros Maragos,et al.  AM-FM energy detection and separation in noise using multiband energy operators , 1993, IEEE Trans. Signal Process..

[28]  Seungmin Rho,et al.  Video scene determination using audiovisual data analysis , 2004, 24th International Conference on Distributed Computing Systems Workshops, 2004. Proceedings..

[29]  Sung Wook Baik,et al.  Mobile-Cloud Assisted Video Summarization Framework for Efficient Management of Remote Sensing Data Generated by Wireless Capsule Sensors , 2014, Sensors.

[30]  Kongqiao Wang,et al.  Real-time generation of personalized home video summaries on mobile devices , 2013, Neurocomputing.

[31]  Harry W. Agius,et al.  Video summarisation: A conceptual framework and survey of the state of the art , 2008, J. Vis. Commun. Image Represent..

[32]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[33]  Fernando Lopes da Silva,et al.  Comprar Niedermeyer's Electroencephalography, 6/e (Basic Principles, Clinical Applications, and Related Fields ) | Fernando Lopes Da Silva | 9780781789424 | Lippincott Williams & Wilkins , 2010 .

[34]  François Lazeyras,et al.  Visual object agnosia is associated with a breakdown of object-selective responses in the lateral occipital cortex , 2014, Neuropsychologia.

[35]  Antonio Bandera,et al.  Spatio-temporal feature-based keyframe detection from video shots using spectral clustering , 2013, Pattern Recognit. Lett..

[36]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[37]  Jinka Parthasarathi,et al.  Hridaya A tele-medicine initiative for cardiovascular disease through convergence of grid, Web 2.0 and SaaS , 2008, Pervasive 2008.

[38]  J. Coull Neural correlates of attention and arousal: insights from electrophysiology, functional neuroimaging and psychopharmacology , 1998, Progress in Neurobiology.

[39]  T. Sejnowski,et al.  Removing electroencephalographic artifacts by blind source separation. , 2000, Psychophysiology.

[40]  Seungmin Rho,et al.  FMF: Query adaptive melody retrieval system , 2006, J. Syst. Softw..

[41]  Sung Wook Baik,et al.  Feature aggregation based visual attention model for video summarization , 2014, Comput. Electr. Eng..

[42]  Qingming Huang,et al.  A framework for flexible summarization of racquet sports video using multiple modalities , 2009, Comput. Vis. Image Underst..

[43]  Bernard Mérialdo,et al.  Rushes video summarization and evaluation , 2009, Multimedia Tools and Applications.

[44]  Marco Pellegrini,et al.  STIMO: STIll and MOving video storyboard for the web scenario , 2009, Multimedia Tools and Applications.

[45]  Zengchang Qin,et al.  A new technique for summarizing video sequences through histogram evolution , 2010, 2010 International Conference on Signal Processing and Communications (SPCOM).

[46]  Serkan Kiranyaz,et al.  A perceptual scheme for fully automatic video shot boundary detection , 2014, Signal Process. Image Commun..

[47]  Sung Wook Baik,et al.  Efficient visual attention based framework for extracting key frames from videos , 2013, Signal Process. Image Commun..

[48]  Alan Hanjalic,et al.  Affective video content representation and modeling , 2005, IEEE Transactions on Multimedia.

[49]  A. Belyavin,et al.  Changes in electrical activity of the brain with vigilance. , 1987, Electroencephalography and clinical neurophysiology.

[50]  Sung Wook Baik,et al.  Adaptive key frame extraction for video summarization using an aggregation mechanism , 2012, J. Vis. Commun. Image Represent..

[51]  Li Li,et al.  A Survey on Visual Content-Based Video Indexing and Retrieval , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).