论文信息 - Discovery and organization of multi-camera user-generated videos of the same event

Discovery and organization of multi-camera user-generated videos of the same event

We propose a framework for the automatic grouping and alignment of unedited multi-camera User-Generated Videos (UGVs) within a database. The proposed framework analyzes the sound in order to match and cluster UGVs that capture the same spatio-temporal event and estimate their relative time-shift to temporally align them. We design a descriptor derived from the pairwise matching of audio chroma features of UGVs. The descriptor facilitates the definition of a classification threshold for automatic query-by-example event identification. We evaluate the proposed identification and synchronization framework on a database of 263 multi-camera recordings of 48 real-world events and compare it with state-of-the-art methods. Experimental results show the effectiveness of the proposed approach in the presence of various audio degradations.

Andrea Cavallaro | Sophia Bano | A. Cavallaro | Sophia Bano

[1] Justin Manweiler,et al. FOCUS: clustering crowdsourced videos by line-of-sight , 2013, SenSys '13.

[2] Mor Naaman,et al. Social multimedia: highlighting opportunities for search and mining of multimedia data in social media applications , 2010, Multimedia Tools and Applications.

[3] Emilia Gómez Gutiérrez,et al. Tonal description of music audio signals , 2006 .

[4] Avery Wang,et al. The Shazam music recognition service , 2006, CACM.

[5] Peter Grosche,et al. Analyzing Chroma Feature Types for Automated Chord Recognition , 2011, Semantic Audio.

[6] Jiajun Wang,et al. A Robust Audio Feature Extraction Algorithm for Music Identification , 2010 .

[7] Sebastian Ewert,et al. The Audio Degradation Toolbox and Its Application to Robustness Evaluation , 2013, ISMIR.

[8] Takuya Fujishima,et al. Realtime Chord Recognition of Musical Sound: a System Using Common Lisp Music , 1999, ICMC.

[9] Zi Huang,et al. Multiple feature hashing for real-time large scale near-duplicate video retrieval , 2011, ACM Multimedia.

[10] Mor Naaman,et al. Less talk, more rock: automated organization of community-contributed collections of concert videos , 2009, WWW '09.

[11] Mark B. Sandler,et al. A tutorial on onset detection in music signals , 2005, IEEE Transactions on Speech and Audio Processing.

[12] Anil C. Kokaram,et al. Temporal synchronization of multiple audio signals , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13] Moncef Gabbouj,et al. Sensor-Based Analysis of User Generated Video for Multi-camera Video Remixing , 2012, MMM.

[14] Daniel P. W. Ellis,et al. Audio fingerprinting to identify multiple videos of an event , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15] Meinard Müller,et al. Information retrieval for music and motion , 2007 .

[16] Hung-Khoon Tan,et al. Beyond search: Event-driven summarization for web videos , 2011, TOMCCAP.

[17] Massimiliano Pontil,et al. Support Vector Machines: Theory and Applications , 2001, Machine Learning and Its Applications.

[18] Moncef Gabbouj,et al. Sport Type Classification of Mobile Videos , 2014, IEEE Transactions on Multimedia.

[19] Gregory H. Wakefield,et al. Audio thumbnailing of popular music using chroma-based representations , 2005, IEEE Transactions on Multimedia.

[20] Fei Wang,et al. Real-time large scale near-duplicate web video retrieval , 2010, ACM Multimedia.

[21] Li Chen,et al. Video copy detection: a comparative study , 2007, CIVR '07.

[22] Zi Huang,et al. Near-duplicate video retrieval: Current research and future trends , 2013, CSUR.

[23] Avery Wang,et al. An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[24] Daniel P. W. Ellis,et al. Identifying `Cover Songs' with Chroma Features and Dynamic Programming Beat Tracking , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[25] David J. Ketchen,et al. THE APPLICATION OF CLUSTER ANALYSIS IN STRATEGIC MANAGEMENT RESEARCH: AN ANALYSIS AND CRITIQUE , 1996 .

[26] Raphaël Troncy,et al. Finding media illustrating events , 2011, ICMR '11.

[27] Hans Weda,et al. Synchronization of Multiple Camera Videos Using Audio-Visual Features , 2010, IEEE Transactions on Multimedia.

[28] Roberto Basili,et al. Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms by Thorsten Joachims , 2003, Comput. Linguistics.

[29] Paris Smaragdis,et al. Clustering and synchronizing multi-camera video via landmark cross-correlation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[30] Antoni B. Chan,et al. Automatic Music Tagging With Time Series Models , 2010, ISMIR.

[31] Jian Lu,et al. Video fingerprinting for copy identification: from research to industry applications , 2009, Electronic Imaging.

[32] Andrea Cavallaro,et al. Audio-visual events for multi-camera synchronization , 2015, Multimedia Tools and Applications.

[33] Hila Becker,et al. Event Identification in Social Media , 2009, WebDB.

[34] Mauro Barbieri,et al. Synchronization of multi-camera video recordings based on audio , 2007, ACM Multimedia.

[35] Marc Leman,et al. Content-Based Music Information Retrieval: Current Directions and Future Challenges , 2008, Proceedings of the IEEE.

[36] Meinard Müller,et al. Audio Matching via Chroma-Based Statistical Features , 2005, ISMIR.

[37] Peter Grosche,et al. High resolution audio synchronization using chroma onset features , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[38] Touradj Ebrahimi,et al. In Tags We Trust: Trust modeling in social tagging of multimedia content , 2012, IEEE Signal Processing Magazine.

[39] Ton Kalker,et al. A Highly Robust Audio Fingerprinting System With an Efficient Search Strategy , 2003 .

[40] Perry R. Cook,et al. Music, cognition, and computerized sound: an introduction to psychoacoustics , 1999 .

[41] Jeroen Breebaart,et al. Features for audio and music classification , 2003, ISMIR.

[42] Pedro Cano,et al. A Review of Audio Fingerprinting , 2005, J. VLSI Signal Process..

[43] Yu He,et al. The YouTube video recommendation system , 2010, RecSys '10.

[44] Namrata Sahayam,et al. Speech Recognition Using Euclidean Distance , 2013 .

[45] Markus Cremer,et al. Content identification in consumer applications , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[46] Yap-Peng Tan,et al. Video organization: Near-Duplicate Video clustering , 2012, 2012 IEEE International Symposium on Circuits and Systems.