Fast GPU audio identification

Audio identification consist in the ability to pair audio signals of the same perceptual nature. In other words, the aim is to be able to compare an audio signal with a modified versions perceptually equivalent. To accomplish that, an audio fingerprint is extracted from the signals and only the fingerprints are compared to asses the similarity. Some guarantee have to be given about the equivalence between comparing audio fingerprints and perceptually comparing the signals. In designing AFPs, a dense representation is more robust than a sparse one. A dense representation also imply more compute cycles and hence a slower processing speed. To speedup the computing of a very dense audio fingerprint, able to stand stable under noise, re-recording, low-pass filtering, etc., we propose the use of a massive parallel architecture based on the Graphics Processing Unit (GPU) with the CUDA programming kit. We prove experimentally that even with a relatively small GPU and using a single core in the GPU, we are able to obtain a notable speedup per core in a GPU/CPU model. We compared our FFT implementation against state of the art CUFFT obtaining impressive results, hence our FFT implementation can help other areas of application.

[1]  Esteban Walter Gonzalez Clua,et al.  Automatic Dynamic Task Distribution between CPU and GPU for Real-Time Systems , 2008, 2008 11th IEEE International Conference on Computational Science and Engineering.

[2]  Frank Kurth,et al.  A unified approach to content-based and fault-tolerant music recognition , 2004, IEEE Transactions on Multimedia.

[3]  Kaare Brandt Petersen,et al.  Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music , 2006, ISMIR.

[4]  Seung-Won Shin,et al.  A robust audio watermarking algorithm using pitch scaling , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[5]  R. Hellman Asymmetry of masking between noise and tone , 1972 .

[6]  Seungjae Lee,et al.  Audio fingerprinting based on normalized spectral subband centroids , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  Ian Buck,et al.  GPU computing with NVIDIA CUDA , 2007, SIGGRAPH Courses.

[8]  Edgar Chávez,et al.  Real Time Tracking of Musical Performances , 2010, MICAI.

[9]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System , 2002, ISMIR.

[10]  E. Zwicker,et al.  Subdivision of the audible frequency range into critical bands , 1961 .

[11]  Les E. Atlas,et al.  Modulation frequency features for audio fingerprinting , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Naga K. Govindaraju,et al.  Fast computation of general Fourier Transforms on GPUS , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[13]  Sridhar Krishnan,et al.  Gaussian Mixture Modeling Using Short Time Fourier Transform Features for Audio Fingerprinting , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[14]  J. Tukey,et al.  An algorithm for the machine calculation of complex Fourier series , 1965 .

[15]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[16]  Steffen Pauws,et al.  Musical key extraction from audio , 2004, ISMIR.

[17]  Jürgen Herre,et al.  Robust matching of audio signals using spectral flatness features , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[18]  Pedro Cano,et al.  Scalability issues in an HMM-based audio fingerprinting , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[19]  David P. Luebke,et al.  CUDA: Scalable parallel programming for high-performance scientific computing , 2008, 2008 5th IEEE International Symposium on Biomedical Imaging: From Nano to Macro.

[20]  Hsueh-Ming Hang,et al.  H.264/AVC motion estimation implmentation on Compute Unified Device Architecture (CUDA) , 2008, 2008 IEEE International Conference on Multimedia and Expo.

[21]  S. R. Subramanya,et al.  Transform-based indexing of audio data for multimedia databases , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[22]  E. Batlle,et al.  AMADEUS: a scalable HMM-based audio information retrieval system , 2004, First International Symposium on Control, Communications and Signal Processing, 2004..

[23]  Hanan Samet,et al.  A Fast Similarity Join Algorithm Using Graphics Processing Units , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[24]  Les E. Atlas,et al.  Improved modulation spectrum through multi-scale modulation frequency decomposition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[25]  Edgar Chávez,et al.  A Robust Entropy-Based Audio-Fingerprint , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[26]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[27]  Pedro Cano,et al.  A review of algorithms for audio fingerprinting , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[28]  Hui Lin,et al.  Generalized Time-Series Active Search With Kullback–Leibler Distance for Audio Fingerprinting , 2006, IEEE Signal Processing Letters.