Automatic multi-camera remix from single video

In this paper we present a first of its kind automatic multi-camera video remix creation system from a single video, referred to as SmartView. We present a novel method which uses the fusion of multimodal content analysis and cinematic rules, for creating a multi-camera experience. Further, a playback metadata based model, which consists of playback instructions for metadata-aware media player, allows for providing a remix experience without editing the original video content. This approach produces a low footprint, which is suitable for on-device processing in resource constrained mobile devices. The research prototype demonstrates the feasibility of such a system with current off-the-shelf mobile devices. The SmartView creation process was seen to take less time than the video duration. 5 out of 9 test users found the fully automatic SmartView remix experience to be better than the conventional playback. The user customized SmartView remix was preferred over conventional playback.

[1]  Xing Xie,et al.  Learning user interest for image browsing on small-form-factor devices , 2005, CHI.

[2]  Igor D. D. Curcio,et al.  Video as memorabilia: user needs for collaborative automatic mobile video production , 2012, CHI.

[3]  Wei Tsang Ooi,et al.  Combining content-based analysis and crowdsourcing to improve user interaction with zoomable video , 2011, ACM Multimedia.

[4]  Bing-Yu Chen,et al.  SmartPlayer: user-centric video fast-forwarding , 2009, CHI.

[5]  Wei Tsang Ooi,et al.  MoViMash: online mobile video mashup , 2012, ACM Multimedia.

[6]  Anssi Klapuri,et al.  Music Tempo Estimation With $k$-NN Regression , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Daniel P. W. Ellis,et al.  Beat Tracking by Dynamic Programming , 2007 .

[8]  Wei Tsang Ooi,et al.  Crowdsourced automatic zoom and scroll for video retargeting , 2010, ACM Multimedia.

[9]  塚田 浩二 Windows Phone のプログラミング , 2010 .

[10]  Geoffroy Peeters,et al.  Joint Estimation of Chords and Downbeats From an Audio Signal , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Igor D. D. Curcio,et al.  We want more: human-computer collaboration in mobile social video remixing of music concerts , 2011, CHI.

[12]  Larry S. Davis,et al.  Multi-scale video cropping , 2007, ACM Multimedia.

[13]  Ariel Shamir,et al.  Cropping Scaling Seam carving Warping Multi-operator , 2009 .

[14]  Wei Tsang Ooi,et al.  Supporting zoomable video streams with dynamic region-of-interest cropping , 2010, MMSys '10.

[15]  Wei Tsang Ooi,et al.  Towards characterizing users' interaction with zoomable video , 2010, SAPMIA '10.

[16]  Peter H. N. de With,et al.  Automatic mashup generation from multiple-camera concert recordings , 2010, ACM Multimedia.

[17]  Michael Gleicher,et al.  Video retargeting: automating pan and scan , 2006, MM '06.

[18]  Masatsugu Kidode,et al.  Region extraction of a gaze object using the gaze point and view image sequences , 2005, ICMI '05.