Audio fingerprinting based on normalized spectral subband centroids

For multimedia fingerprinting, it is crucial to extract relevant features that allow direct access to the distinguishing characteristics of a multimedia object. Features used for fingerprinting directly relate to the performance of the entire fingerprinting system. The paper proposes a novel audio fingerprinting method based on normalized spectral subband centroids. The spectral subband centroid is selected due to its resilience against equalization, compression, and noise addition. Both reliability and robustness issues in the fingerprinting system are addressed. Experimental results show that the proposed method is not only reliable, but also robust against various audio processing steps, including MP3 compression, equalization, random start, time-scale modification, and linear speed change.

[1]  Stan Z. Li,et al.  Content-based audio classification and retrieval using the nearest feature line method , 2000, IEEE Trans. Speech Audio Process..

[2]  Ton Kalker,et al.  A Reliability Model for the Detection of Electronic Watermarks in Digital Images; Symposium on Communications and Vehicular Technology (5th) Held in Enschede, The Netherlands on Oct 1997 , 1997 .

[3]  Kuldip K. Paliwal,et al.  Spectral subband centroid features for speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Alfred Menezes,et al.  Handbook of Applied Cryptography , 2018 .

[5]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[6]  Jürgen Herre,et al.  Robust matching of audio signals using spectral flatness features , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[7]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System With an Efficient Search Strategy , 2003 .

[8]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System , 2002, ISMIR.

[9]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[10]  Qi Li,et al.  Recognition of noisy speech using dynamic spectral subband centroids , 2004, IEEE Signal Processing Letters.

[11]  Ton Kalker,et al.  Issues with digital watermarking and perceptual hashing , 2001, SPIE ITCom.

[12]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[13]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[14]  Ton Kalker,et al.  A robust image fingerprinting system using the Radon transform , 2004, Signal Process. Image Commun..

[15]  William M. Hartmann,et al.  Psychoacoustics: Facts and Models , 2001 .

[16]  Zhu Liu,et al.  Using Both Audio and Visual Clues , 2000 .