Robust Quad-Based Audio Fingerprinting

We propose an audio fingerprinting method that adapts findings from the field of blind astrometry to define simple, efficiently representable characteristic feature combinations called quads. Based on these, an audio identification algorithm is described that is robust to noise and severe time-frequency scale distortions and accurately identifies the underlying scale transform factors. The low number and compact representation of content features allows for efficient application of exact fixed-radius near-neighbor search methods for fingerprint matching in large audio collections. We demonstrate the practicability of the method on a collection of 100,000 songs, analyze its performance for a diverse set of noise as well as severe speed, tempo and pitch scale modifications, and identify a number of advantages of our method over two state-of-the-art distortion-robust audio identification algorithms.

[1]  Gerhard Widmer,et al.  Quad-Based Audio Fingerprinting Robust to Time and Frequency Scaling , 2014, DAFx.

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  Pierre Dumouchel,et al.  A robust audio fingerprinting method for content-based copy detection , 2014, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).

[4]  I. Wald,et al.  On fast Construction of SAH-based Bounding Volume Hierarchies , 2007, 2007 IEEE Symposium on Interactive Ray Tracing.

[5]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[6]  Alexander Keller,et al.  Shallow Bounding Volume Hierarchies for Fast SIMD Ray Tracing of Incoherent Rays , 2008, Comput. Graph. Forum.

[7]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[8]  S. Roweis,et al.  ASTROMETRY.NET: BLIND ASTROMETRIC CALIBRATION OF ARBITRARY ASTRONOMICAL IMAGES , 2009, 0910.2233.

[9]  Gaël Richard,et al.  Robust frequency-based Audio Fingerprinting , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Marc Leman,et al.  Panako - A Scalable Acoustic Fingerprinting System Handling Time-Scale and Pitch Modification , 2014, ISMIR.

[11]  Gerhard Widmer,et al.  Fast Identification of Piece and Score Position via Symbolic Fingerprinting , 2012, ISMIR.

[12]  Geoffroy Peeters,et al.  AudioPrint: An efficient audio fingerprint system based on a novel cost-less synchronization scheme , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Wei Wang,et al.  SIFT-based local spectrogram image descriptor: a novel feature for robust music identification , 2015, EURASIP Journal on Audio, Speech, and Music Processing.

[14]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[15]  Rabab Kreidieh Ward,et al.  A local fingerprinting approach for audio copy detection , 2014, Signal Process..

[16]  Shumeet Baluja,et al.  Waveprint: Efficient wavelet-based audio fingerprinting , 2008, Pattern Recognit..

[17]  Christian Bauckhage,et al.  Efficient Subframe Video Alignment Using Short Descriptors , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.