Automatic mashup generation from multiple-camera concert recordings

A large number of videos are captured and shared by the audience from musical concerts. However, such recordings are typically perceived as boring mainly because of their limited view, poor visual quality and incomplete coverage. It is our objective to enrich the viewing experience of these recordings by exploiting the abundance of content from multiple sources. In this paper, we propose a novel \Virtual Director system that automatically combines the most desirable segments from different recordings resulting in a single video stream, called mashup. We start by eliciting requirements from focus groups, interviewing professional video editors and consulting film grammar literature. We design a formal model for automatic mashup generation based on maximizing the degree of fulfillment of the requirements. Various audio-visual content analysis techniques are used to determine how well the requirements are satisfied by a recording. To validate the system, we compare our mashups with two other mashups: manually created by a professional video editor and machine generated by random segment selection. The mashups are evaluated in terms of visual quality, content diversity and pleasantness by 40 subjects. The results show that our mashups and the manual mashups are perceived as comparable, while both of them are significantly higher than the random mashups in all three terms.

[1]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[2]  Zhou Wang,et al.  No-reference perceptual quality assessment of JPEG compressed images , 2002, Proceedings. International Conference on Image Processing.

[3]  Harpreet S. Sawhney,et al.  Robust Video Mosaicing through Topology Inference and Local to Global Alignment , 1998, ECCV.

[4]  Jeroen Breebaart,et al.  Features for audio and music classification , 2003, ISMIR.

[5]  Katsumi Tanaka,et al.  Skimming Multiple Perspective Video Using Tempo-Spatial Importance Measures , 2000, VDB.

[6]  Mauro Barbieri Automatic summarization of narrative video , 2007 .

[7]  R. Larsen An introduction to mathematical statistics and its applications / Richard J. Larsen, Morris L. Marx , 1986 .

[8]  Alan Hanjalic,et al.  Intelligent browsing of concert videos , 2007, ACM Multimedia.

[9]  Weisi Lin,et al.  A no-reference quality metric for measuring image blur , 2003, Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings..

[10]  Marc Pollefeys,et al.  Visual-hull reconstruction from uncalibrated and unsynchronized video streams , 2004, Proceedings. 2nd International Symposium on 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004..

[11]  Hans Weda,et al.  Edit while watching: home video editing made easy , 2007, Electronic Imaging.

[12]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[13]  Paolo Toth,et al.  Knapsack Problems: Algorithms and Computer Implementations , 1990 .

[14]  Wolfgang Effelsberg,et al.  A Virtual Camera Team for Lecture Recording , 2008, IEEE MultiMedia.

[15]  Mauro Barbieri,et al.  Synchronization of multi-camera video recordings based on audio , 2007, ACM Multimedia.

[16]  Daniel Alonso Moreno Adobe Premiere pro , 2004 .

[17]  Stanislav Sumec Multi Camera Automatic Video Editing , 2004, ICCVG.

[18]  J. E. Schrader Detecting and interpreting musical note onsets in polyphonic music , 2003 .

[19]  H. Zettl Sight, Sound, Motion: Applied Media Aesthetics , 1973 .

[20]  Marcel Worring,et al.  The Role of Visual Content and Style for Concert Video Indexing , 2007, 2007 IEEE International Conference on Multimedia and Expo.