Spatial and Motion Saliency Prediction Method Using Eye Tracker Data for Video Summarization

Video summarization is the process to extract the most significant contents of a video and to represent it in a concise form. The existing methods for video summarization could not achieve a satisfactory result for a video with camera movement and significant illumination changes. To solve these problems, in this paper, a new framework for video summarization is proposed based on eye tracker data, as human eyes can track moving object accurately in these cases. The smooth pursuit is the state of eye movement when a user follows a moving object in a video. This motivates us to implement a new method to distinguish smooth pursuit from other type of gaze points, such as fixation and saccade. The smooth pursuit provides only the location of moving objects in a video frame; however, it does not indicate whether the located moving objects are very attractive (i.e., salient regions) to viewers or not, as well as the amount of motion of the moving objects. The amount of salient regions and object motions are the two important features to measure the viewer’s attention level for determining the key frames for video summarization. To find the most attractive objects, a new spatial saliency prediction method is also proposed by constructing a saliency map around each smooth pursuit gaze point based on human visual field, such as fovea, parafoveal, and perifovea regions. To identify the amount of object motions, the total distances between the current and the previous gaze points of viewers during smooth pursuit are measured as a motion saliency score. The motivation is that the movement of eye gaze is related to the motion of the objects during smooth pursuit. Finally, both spatial and motion saliency maps are combined to obtain an aggregated saliency score for each frame and a set of key frames are selected based on user selected or system default skimming ratio. The proposed method is implemented on Office video data set that contains videos with camera movements and illumination changes. Experimental results confirm the superior performance of the proposed spatial and motion saliency prediction method compared with the state-of-the-art methods.

[1]  Ching-Tang Fan,et al.  Heterogeneous Information Fusion and Visualization for a Large-Scale Intelligent Video Surveillance System , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[2]  Yong Jae Lee,et al.  Predicting Important Objects for Egocentric Video Summarization , 2015, International Journal of Computer Vision.

[3]  Lie Lu,et al.  A generic framework of user attention model and its application in video summarization , 2005, IEEE Trans. Multim..

[4]  Yihao Zhang,et al.  A new approach for extracting and summarizing abnormal activities in surveillance videos , 2014, 2014 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[5]  Xuelong Li,et al.  Multi-spectral saliency detection , 2013, Pattern Recognit. Lett..

[6]  Martin K. Purvis,et al.  Wildlife video key-frame extraction based on novelty detection in semantic context , 2011, Multimedia Tools and Applications.

[7]  Xuelong Li,et al.  Surveillance Video Synopsis via Scaling Down Objects , 2016, IEEE Transactions on Image Processing.

[8]  Zhi-Hua Zhou,et al.  Multi-View Video Summarization , 2010, IEEE Transactions on Multimedia.

[9]  James M. Rehg,et al.  Gaze-enabled egocentric video summarization via constrained submodular maximization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Bu-Sung Lee,et al.  Direct Intermode Selection for H.264 Video Coding Using Phase Correlation , 2011, IEEE Transactions on Image Processing.

[11]  Fumin Shen,et al.  Spatial and temporal scoring for egocentric video summarization , 2016, Neurocomputing.

[12]  Sung Wook Baik,et al.  Divide-and-conquer based summarization framework for extracting affective video content , 2016, Neurocomputing.

[13]  Mohan S. Kankanhalli,et al.  Static saliency vs. dynamic saliency: a comparative study , 2013, ACM Multimedia.

[14]  Amit K. Roy-Chowdhury,et al.  Video summarization through change detection in a non-overlapping camera network , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[15]  Wei-Ta Chu,et al.  Editing by Viewing: Automatic Home Video Summarization by Viewing Behavior Analysis , 2011, IEEE Transactions on Multimedia.

[16]  Huang-Chia Shih,et al.  A Novel Attention-Based Key-Frame Determination Method , 2013, IEEE Transactions on Broadcasting.

[17]  Chia-han Lee,et al.  On-Line Multi-View Video Summarization for Wireless Video Sensor Network , 2015, IEEE Journal of Selected Topics in Signal Processing.

[18]  A. Savitzky,et al.  Smoothing and Differentiation of Data by Simplified Least Squares Procedures. , 1964 .

[19]  Chong-Wah Ngo,et al.  Summarizing Rushes Videos by Motion, Object, and Event Understanding , 2012, IEEE Transactions on Multimedia.

[20]  Alan C. Bovik,et al.  Saliency Prediction on Stereoscopic Videos , 2014, IEEE Transactions on Image Processing.

[21]  José María Martínez Sanchez,et al.  Binary tree based on-line video summarization , 2008, TVS '08.

[22]  Sharath Pankanti,et al.  Efficient UAV video event summarization , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[23]  Petros Maragos,et al.  Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention , 2013, IEEE Transactions on Multimedia.

[24]  Keith Rayner,et al.  Parafoveal processing in reading , 2011, Attention, Perception, & Psychophysics.

[25]  P. R. Deshmukh,et al.  Keyframe Based Video Summarization Using Automatic Threshold & Edge Matching Rate , 2012 .

[26]  Sung Wook Baik,et al.  Saliency-directed prioritization of visual data in wireless surveillance networks , 2015, Inf. Fusion.

[27]  Bin Li,et al.  Classification of Human Gaze in Spatial Guidance and Control , 2015, 2015 IEEE International Conference on Systems, Man, and Cybernetics.

[28]  B. S. Manjunath,et al.  Eye tracking assisted extraction of attentionally important objects from videos , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Luming Zhang,et al.  An Effective Video Summarization Framework Toward Handheld Devices , 2015, IEEE Transactions on Industrial Electronics.

[30]  Chun-Rong Huang,et al.  Maximum a Posteriori Probability Estimation for Online Surveillance Video Synopsis , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[31]  Harish Katti,et al.  Affective Video Summarization and Story Board Generation Using Pupillary Dilation and Eye Gaze , 2011, 2011 IEEE International Symposium on Multimedia.

[32]  Masahiro Toyoura,et al.  Film Comic Generation with Eye Tracking , 2013, MMM.

[33]  Sung Wook Baik,et al.  Video summarization based tele-endoscopy: a service to efficiently manage visual data generated during wireless capsule endoscopy procedure , 2014, Journal of Medical Systems.

[34]  Alan Kennedy,et al.  Book Review: Eye Tracking: A Comprehensive Guide to Methods and Measures , 2016, Quarterly journal of experimental psychology.

[35]  Shimon Ullman,et al.  Face Recognition: The Problem of Compensating for Changes in Illumination Direction , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Jurandy Almeida,et al.  Online video summarization on compressed domain , 2013, J. Vis. Commun. Image Represent..

[37]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Yi Yang,et al.  DevNet: A Deep Event Network for multimedia event detection and evidence recounting , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Yi-Ping Hung,et al.  Target-driven video summarization in a camera network , 2013, 2013 IEEE International Conference on Image Processing.

[40]  Shmuel Peleg,et al.  Live video synopsis for multiple cameras , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[41]  Naokazu Yokoya,et al.  Textual description-based video summarization for video blogs , 2015, 2015 IEEE International Conference on Multimedia and Expo (ICME).

[42]  Wei Liu,et al.  A time-slice optimization based weak feature association algorithm for video condensation , 2016, Multimedia Tools and Applications.

[43]  Sebnem Baydere,et al.  Low-cost prioritization of image blocks in wireless sensor networks for border surveillance , 2014, J. Netw. Comput. Appl..

[44]  Chinh T. Dang,et al.  Heterogeneity Image Patch Index and Its Application to Consumer Video Summarization , 2014, IEEE Transactions on Image Processing.

[45]  Xuelong Li,et al.  Saliency Detection by Multiple-Instance Learning , 2013, IEEE Transactions on Cybernetics.

[46]  Manoranjan Paul,et al.  Fusion of Foreground Object, Spatial and Frequency Domain Motion Information for Video Summarization , 2015, PSIVT Workshops.

[47]  Antoine Coutrot,et al.  Learning a time-dependent master saliency map from eye-tracking data in videos , 2017, ArXiv.

[48]  Seung-Won Jung,et al.  Order-Preserving Condensation of Moving Objects in Surveillance Videos , 2016, IEEE Transactions on Intelligent Transportation Systems.

[49]  Max Q.-H. Meng,et al.  A general framework for wireless capsule endoscopy study synopsis , 2015, Comput. Medical Imaging Graph..

[50]  Bu-Sung Lee,et al.  Pattern-based video coding with dynamic background modeling , 2013, EURASIP J. Adv. Signal Process..

[51]  Shaogang Gong,et al.  Discovery of Shared Semantic Spaces for Multiscene Video Query and Summarization , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[52]  Francesco Tisato,et al.  Attentive Monitoring of Multiple Video Streams Driven by a Bayesian Foraging Strategy , 2014, IEEE Transactions on Image Processing.

[53]  Manoranjan Paul,et al.  Summarizing Surveillance Video by Saliency Transition and Moving Object Information , 2015, 2015 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[54]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[55]  Manoranjan Paul,et al.  A hybrid object detection technique from dynamic background using Gaussian mixture models , 2008, 2008 IEEE 10th Workshop on Multimedia Signal Processing.

[56]  Yue Wang,et al.  Motion-State-Adaptive Video Summarization via Spatiotemporal Analysis , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[57]  Manoranjan Paul,et al.  Human visual field based saliency prediction method using Eye Tracker data for video summarization , 2016, 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[58]  Yusuf Sinan Akgül,et al.  Eye-gaze based real-time surveillance video synopsis , 2009, Pattern Recognit. Lett..

[59]  Ruimin Hu,et al.  Fast Synopsis for Moving Objects Using Compressed Video , 2014, IEEE Signal Processing Letters.

[60]  Nikolaos D. Doulamis,et al.  Edge-motion video summarization: Economical video summarization for low powered devices , 2009, 2009 10th Workshop on Image Analysis for Multimedia Interactive Services.

[61]  Zhe-Ming Lu,et al.  Video abstraction based on the visual attention model and online clustering , 2013, Signal Process. Image Commun..

[62]  Pingkun Yan,et al.  Visual Saliency by Selective Contrast , 2013, IEEE Transactions on Circuits and Systems for Video Technology.

[63]  A. Torralba,et al.  Fixations on low-resolution images. , 2010, Journal of vision.

[64]  Nuno Vasconcelos,et al.  On the plausibility of the discriminant center-surround hypothesis for visual saliency. , 2008, Journal of vision.