MoVieUp: Automatic Mobile Video Mashup

With the proliferation of mobile devices, people are taking videos of the same events anytime and anywhere. Even though these crowdsourced videos are uploaded to the cloud and shared, the viewing experience is very limited due to monotonous viewing, visual redundancy, and bad audio-video quality. In this paper, we present a fully automatic mobile video mashup system that works in the cloud to combine recordings captured by multiple devices from different view angles and at different time slots into a single yet enriched and professional looking video-audio stream. We summarize a set of computational filming principles for multicamera settings from a formal focus study. Based on these principles, given a set of recordings of the same event, our system is able to synchronize these recordings with audio fingerprints, assess audio and video quality, detect video cut points, and generate video and audio mashups. The audio mashup is the maximization of audio quality under the less switching principle, while the video mashup is formalized as maximizing video quality and content diversity, constrained by the summarized filming principles. Our system is different from any existing work in this field in three ways: 1) our system is fully automatic; 2) the system incorporates a set of computational domain-specific filming principles summarized from a formal focus study; and 3) in addition to video, we also consider audio mashup that is a key factor of user experience (UX) yet often overlooked in existing research. Evaluations show that our system achieves performance results that are superior to state-of-the-art video mashup techniques, thus providing a better UX.

[1]  Lie Lu,et al.  Optimization-based automated home video editing system , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Changsheng Xu,et al.  Interaction Design for Mobile Visual Search , 2013, IEEE Transactions on Multimedia.

[3]  Wei Tsang Ooi,et al.  Jiku director: a mobile video mashup system , 2013, MM '13.

[4]  Yaser Sheikh,et al.  Automatic editing of footage from multiple social cameras , 2014, ACM Trans. Graph..

[5]  Hsin-Min Wang,et al.  Non-reference audio quality assessment for online live music recordings , 2013, ACM Multimedia.

[6]  Yongdong Zhang,et al.  Instant Mobile Video Search With Layered Audio-Video Indexing and Progressive Transmission , 2014, IEEE Transactions on Multimedia.

[7]  Lie Lu,et al.  Automatic music video generation based on temporal pattern analysis , 2004, MULTIMEDIA '04.

[8]  Tao Mei,et al.  Finding perfect rendezvous on the go: accurate mobile visual localization and its applications to routing , 2012, ACM Multimedia.

[9]  Riccardo Leonardi,et al.  The art of video MashUp: supporting creative users with an innovative and smart application , 2011, Multimedia Tools and Applications.

[10]  Dan Stowell,et al.  Adaptive whitening for Improved Real-Time audio onset Detection , 2007, ICMC.

[11]  Shih-Fu Chang,et al.  Computable scenes and structures in films , 2002, IEEE Trans. Multim..

[12]  Stefan Sharff The Elements of Cinema: Toward a Theory of Cinesthetic Impact , 1982 .

[13]  Yang Wang,et al.  JIGSAW: interactive mobile visual search with multimodal queries , 2011, ACM Multimedia.

[14]  Wei Tsang Ooi,et al.  MoViMash: online mobile video mashup , 2012, ACM Multimedia.

[15]  Shih-Fu Chang,et al.  A utility framework for the automatic generation of audio-visual skims , 2002, MULTIMEDIA '02.

[16]  Tao Mei,et al.  Home Video Visual Quality Assessment With Spatiotemporal Factors , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Wolfgang Effelsberg,et al.  A Virtual Camera Team for Lecture Recording , 2008, IEEE MultiMedia.

[18]  Edward Jones,et al.  Audio quality assessment techniques - A review, and recent developments , 2009, Signal Process..

[19]  Abhishek Ranjan,et al.  Automatic camera control using unobtrusive vision and audio tracking , 2010, Graphics Interface.

[20]  D. Arijon,et al.  Grammar of Film Language , 1976 .

[21]  Oded Ghitza,et al.  Objective Assessment of Speech and Audio Quality - Technology and Applications , 2006, IEEE Trans. Speech Audio Process..

[22]  Hyung-Myung Kim,et al.  Efficient camera motion characterization for MPEG video indexing , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[23]  Stanislav Sumec Multi Camera Automatic Video Editing , 2004, ICCVG.

[24]  Riccardo Leonardi,et al.  Interactive video mashup based on emotional identity , 2010, 2010 18th European Signal Processing Conference.

[25]  Frank Manchel Film Study: An Analytical Bibliography , 1990 .

[26]  Jian Sun,et al.  Bundled camera paths for video stabilization , 2013, ACM Trans. Graph..

[27]  Hans Weda,et al.  Synchronization of Multiple Camera Videos Using Audio-Visual Features , 2010, IEEE Transactions on Multimedia.

[28]  Peter H. N. de With,et al.  Automatic mashup generation from multiple-camera concert recordings , 2010, ACM Multimedia.

[29]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[30]  Eero P. Simoncelli,et al.  Image quality assessment: from error visibility to structural similarity , 2004, IEEE Transactions on Image Processing.