Detecting and Removing Visual Distractors for Video Aesthetic Enhancement

Personal videos often contain visual distractors, which are objects that are accidentally captured and can distract viewers from focusing on the main subjects. We propose a method to automatically detect and localize these distractors through learning from a manually labeled dataset. To achieve spatially and temporally coherent detection, we propose extracting features at the temporal-superpixel level using a traditional supporting vector machine based learning framework. We also experiment with end-to-end learning using convolutional neural networks, which achieves slightly higher performance than other methods. The classification result is further refined in a postprocessing step based on graph-cut optimization. Experimental results show that our method achieves an accuracy of 81% and a recall of 86%. We demonstrate several ways of removing the detected distractors to improve the video quality, including video hole filling, video frame replacement, and camera path replanning. The user study results show that our method can significantly improve the aesthetic quality of videos.

[1]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[2]  Lorenzo Torresani,et al.  Learning Spatiotemporal Features with 3D Convolutional Networks , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Jitendra Malik,et al.  Learning to segment moving objects in videos , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Cewu Lu,et al.  Abnormal Event Detection at 150 FPS in MATLAB , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Weisi Lin,et al.  Blind Image Quality Assessment Using Statistical Structural and Luminance Features , 2016, IEEE Transactions on Multimedia.

[7]  Weisi Lin,et al.  A Video Saliency Detection Model in Compressed Domain , 2014, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Mubarak Shah,et al.  Video Object Segmentation through Spatially Accurate and Temporally Dense Extraction of Primary Object Regions , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Jitendra Malik,et al.  Finding action tubes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  James Zijun Wang,et al.  Rating Image Aesthetics Using Deep Learning , 2015, IEEE Transactions on Multimedia.

[11]  Nanning Zheng,et al.  Learning to Detect A Salient Object , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Harry Shum,et al.  Full-frame video stabilization with motion inpainting , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  John M. Henderson,et al.  Clustering of Gaze During Dynamic Scene Viewing is Predicted by Motion , 2011, Cognitive Computation.

[14]  Limin Wang,et al.  Action recognition with trajectory-pooled deep-convolutional descriptors , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Shi-Min Hu,et al.  Robust background identification for dynamic video editing , 2016, ACM Trans. Graph..

[16]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Is-Haka Mkwawa,et al.  Content-Based Video Quality Prediction for HEVC Encoded Videos Streamed Over Packet Networks , 2015, IEEE Transactions on Multimedia.

[18]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Lifeng Sun,et al.  Social-Aware Video Recommendation for Online Social Groups , 2017, IEEE Transactions on Multimedia.

[20]  Bing Li,et al.  Multimodal Web Aesthetics Assessment Based on Structural SVM and Multitask Fusion Learning , 2016, IEEE Transactions on Multimedia.

[21]  Esa Rahtu,et al.  Segmenting Salient Objects from Images and Videos , 2010, ECCV.

[22]  Mubarak Shah,et al.  Visual attention detection in video sequences using spatiotemporal cues , 2006, MM '06.

[23]  Chang-Su Kim,et al.  Spatiotemporal Saliency Detection for Video Sequences Based on Random Walk With Restart , 2015, IEEE Transactions on Image Processing.

[24]  Peter Schelkens,et al.  Spatio-Temporally Consistent Color and Structure Optimization for Multiview Video Color Correction , 2015, IEEE Transactions on Multimedia.

[25]  Adam Finkelstein,et al.  Finding distractors in images , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Samy Bengio,et al.  Semi-supervised adapted HMMs for unusual event detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Deyu Meng,et al.  Co-Saliency Detection via a Self-Paced Multiple-Instance Learning Framework , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Weisi Lin,et al.  Do Personality and Culture Influence Perceived Video Quality and Enjoyment? , 2016, IEEE Transactions on Multimedia.

[29]  Patrick Pérez,et al.  Video Inpainting of Complex Scenes , 2014, SIAM J. Imaging Sci..

[30]  Jitendra Malik,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Segmentation of Moving Objects by Long Term Video Analysis , 2022 .

[31]  Sanghoon Lee,et al.  Transition of Visual Attention Assessment in Stereoscopic Images With Evaluation of Subjective Visual Quality and Discomfort , 2015, IEEE Transactions on Multimedia.

[32]  Wilmot Li,et al.  Tools for placing cuts and transitions in interview video , 2012, ACM Trans. Graph..

[33]  Michael Gleicher,et al.  Re-cinematography: Improving the camerawork of casual video , 2008, TOMCCAP.

[34]  Yizhou Yu,et al.  Deep Contrast Learning for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Tsuhan Chen,et al.  Towards aesthetics: a photo quality assessment and photo selection system , 2010, ACM Multimedia.

[36]  Panos Nasiopoulos,et al.  Human Visual System-Based Saliency Detection for High Dynamic Range Content , 2016, IEEE Transactions on Multimedia.

[37]  Larry S. Davis,et al.  Representing Videos Using Mid-level Discriminative Patches , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Mohan S. Kankanhalli,et al.  Video retargeting for aesthetic enhancement , 2010, ACM Multimedia.

[39]  Guillermo Sapiro,et al.  Video SnapCut: robust video object cutout using localized classifiers , 2009, SIGGRAPH 2009.

[40]  Liang Wang,et al.  Learning Representative Deep Features for Image Set Analysis , 2015, IEEE Transactions on Multimedia.

[41]  Yu-Wing Tai,et al.  Salient Region Detection via High-Dimensional Color Transform , 2014, CVPR.

[42]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[43]  Frédo Durand,et al.  Learning to predict where humans look , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[44]  Han Zhao,et al.  Simultaneous Camera Path Optimization and Distraction Removal for Improving Amateur Video , 2015, IEEE Transactions on Image Processing.

[45]  Jong-Seok Lee,et al.  On Evaluating Perceptual Quality of Online User-Generated Videos , 2016, IEEE Transactions on Multimedia.

[46]  John W. Fisher,et al.  A Video Representation Using Temporal Superpixels , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  James Ze Wang,et al.  Studying Aesthetics in Photographic Images Using a Computational Approach , 2006, ECCV.

[48]  Jong-Seok Lee,et al.  Automated Video Editing for Aesthetic Quality Improvement , 2015, ACM Multimedia.

[49]  Anton Osokin,et al.  Fast Approximate Energy Minimization with Label Costs , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51]  Yi Yang,et al.  DevNet: A Deep Event Network for multimedia event detection and evidence recounting , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Xiaogang Wang,et al.  Saliency detection by multi-context deep learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53]  Abdul Hameed,et al.  A Decision-Tree-Based Perceptual Video Quality Prediction Model and Its Application in FEC for Wireless Multimedia Communications , 2016, IEEE Transactions on Multimedia.

[54]  Nanning Zheng,et al.  Video attention: Learning to detect a salient object sequence , 2008, 2008 19th International Conference on Pattern Recognition.

[55]  Gui-Song Xia,et al.  A Computational Model for Object-Based Visual Saliency: Spreading Attention Along Gestalt Cues , 2016, IEEE Transactions on Multimedia.

[56]  Fei-Fei Li,et al.  Discriminative Segment Annotation in Weakly Labeled Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling , 2015, CVPR 2015.

[58]  Ehud Rivlin,et al.  Robust Real-Time Unusual Event Detection using Multiple Fixed-Location Monitors , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Fei-Fei Li,et al.  Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[60]  Yunfei Chen,et al.  Evaluating the visual quality of web pages using a computational aesthetic approach , 2011, WSDM '11.

[61]  Lihi Zelnik-Manor,et al.  Learning Video Saliency from Human Gaze Using Candidate Selection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[62]  Tsuhan Chen,et al.  > Replace This Line with Your Paper Identification Number (double-click Here to Edit) < , 2022 .

[63]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.