BABAZ: A large scale audio search system for video copy detection

This paper presents BABAZ, an audio search system to search modified segments in large databases of music or video tracks. It is based on an efficient audio feature matching system which exploits the reciprocal nearest neighbors to produce a per-match similarity score. Temporal consistency is taken into account based on the audio matches, and boundary estimation allows the precise localization of the matching segments. The method is mainly intended for video retrieval based on their audio track, as typically evaluated in the copy detection task of TRECVID evaluation campaigns. The evaluation conducted on music retrieval shows that our system is comparable to a reference audio fingerprinting system for music retrieval, and significantly outperforms it on audio-based video retrieval, as shown by our experiments conducted on the dataset used in the copy detection task of TRECVID'2010 campaign.

[1]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[2]  C. Schmid,et al.  Exploiting descriptor distances for precise image search , 2011 .

[3]  Cordelia Schmid,et al.  An Image-Based Approach to Video Copy Detection With Spatio-Temporal Post-Filtering , 2010, IEEE Transactions on Multimedia.

[4]  Cordelia Schmid,et al.  INRIA @TRECVID 2011: Copy Detection & Multimedia Event Detection , 2011, TRECVID.

[5]  Pedro Cano,et al.  A Review of Audio Fingerprinting , 2005, J. VLSI Signal Process..

[6]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Li Chen,et al.  Video copy detection: a comparative study , 2007, CIVR '07.

[8]  Matthijs Douze,et al.  Searching in one billion vectors: Re-rank with source coding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  David A. Ross,et al.  Survey and Evaluation of Audio Fingerprinting Schemes for Mobile Query-by-Example Applications , 2011, ISMIR.

[10]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[11]  Kai Li,et al.  Efficient k-nearest neighbor graph construction for generic similarity measures , 2011, WWW.

[12]  Joan Serrà,et al.  Identification of versions of the same musical composition by processing audio descriptions , 2011 .

[13]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.