Waveprint: Efficient wavelet-based audio fingerprinting

In this paper, we present Waveprint, a novel method for audio identification. Waveprint uses a combination of computer-vision techniques and large-scale data-stream processing algorithms to create compact fingerprints of audio data that can be efficiently matched. The resulting system has excellent identification capabilities for small snippets of audio that have been degraded in a variety of manners, including competing noise, poor recording quality and cell-phone playback. We explicitly measure the tradeoffs between performance, memory usage, and computation through extensive experimentation. The system is more efficient in terms of memory usage and computation, while being more accurate when compared with previous state of the art systems. The applications of Waveprint include song identification for end-consumer use, copyright protection for audio assets, copyright protection for television assets and synchronization of off-line audio sources, such as live television.

[1]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System , 2002, ISMIR.

[2]  John C. Platt,et al.  Distortion discriminant analysis for audio fingerprinting , 2003, IEEE Trans. Speech Audio Process..

[3]  E. J. Stollnitz,et al.  Wavelets for Computer Graphics : A Primer , 1994 .

[4]  Michael Fink,et al.  Social- and Interactive-Television Applications Based on Real-Time Ambient-Audio Identification , 2006 .

[5]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[6]  Raymond N. J. Veldhuis,et al.  Time-scale and pitch modifications of speech signals and resynthesis from the discrete short-time Fourier transform , 1996, Speech Commun..

[7]  David Salesin,et al.  Fast multiresolution image querying , 1995, SIGGRAPH.

[8]  Rina Dechter,et al.  Generalized best-first search strategies and the optimality of A* , 1985, JACM.

[10]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[11]  John C. Platt,et al.  Extracting noise-robust features from audio data , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[13]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[14]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[15]  Larry Carter,et al.  Universal classes of hash functions (Extended Abstract) , 1977, STOC '77.

[16]  David Salesin,et al.  Wavelets for computer graphics: a primer.1 , 1995, IEEE Computer Graphics and Applications.

[17]  Pedro Cano,et al.  Audio Fingerprinting: Concepts And Applications , 2005, Computational Intelligence for Modelling and Prediction.

[18]  Shumeet Baluja,et al.  Audio Fingerprinting: Combining Computer Vision & Data Stream Processing , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[19]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[20]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[21]  Larry Carter,et al.  Universal Classes of Hash Functions , 1979, J. Comput. Syst. Sci..

[22]  Derek Hoiem,et al.  Computer vision for music identification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[24]  Shumeet Baluja,et al.  Advertisement Detection and Replacement using Acoustic and Visual Repetition , 2006, 2006 IEEE Workshop on Multimedia Signal Processing.

[25]  Edith Cohen,et al.  Finding interesting associations without support pruning , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[26]  Pedro Cano,et al.  A review of algorithms for audio fingerprinting , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[27]  L. R. Rabiner,et al.  A comparative study of several dynamic time-warping algorithms for connected-word recognition , 1981, The Bell System Technical Journal.