Making a scene: alignment of complete sets of clips based on pairwise audio match

As the amount of social video content captured at physical-world events, and shared online, is rapidly increasing, there is a growing need for robust methods for organization and presentation of the captured content. In this work, we significantly extend prior work that examined automatic detection of videos from events that were captured at the same time, i.e. "overlapping". We go beyond finding pairwise matches between video clips and describe the construction of scenes, or sets of multiple overlapping videos, each scene presenting a coherent moment in the event. We test multiple strategies for scene construction, using a greedy algorithm to create a mapping of videos into scenes, and a clustering refinement step to increase the precision of each scene. We evaluate the strategies in multiple settings and show that a greedy and clustering approach results in best possible balance between recall and precision for all settings.

[1]  Hila Becker,et al.  Identifying content for planned events across social media sites , 2012, WSDM '12.

[2]  M. Pollefeys,et al.  Unstructured video-based rendering: interactive exploration of casually captured videos , 2010, ACM Trans. Graph..

[3]  Julio Gonzalo,et al.  A comparison of extrinsic clustering evaluation metrics based on formal constraints , 2009, Information Retrieval.

[4]  Geoffrey Zweig,et al.  Syntactic Clustering of the Web , 1997, Comput. Networks.

[5]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[6]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[7]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[8]  Mor Naaman,et al.  Less talk, more rock: automated organization of community-contributed collections of concert videos , 2009, WWW '09.

[9]  Mauro Barbieri,et al.  Synchronization of multi-camera video recordings based on audio , 2007, ACM Multimedia.

[10]  Pablo César,et al.  Automatic generation of video narratives from shared UGC , 2011, HT '11.

[11]  Raphaël Troncy,et al.  Finding media illustrating events , 2011, ICMR '11.

[12]  Peter H. N. de With,et al.  Automatic mashup generation from multiple-camera concert recordings , 2010, ACM Multimedia.

[13]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[15]  Cees Snoek,et al.  Crowdsourcing rock n' roll multimedia retrieval , 2010, ACM Multimedia.

[16]  Justin Zobel,et al.  Clustering near-duplicate images in large collections , 2007, MIR '07.