MASK: Robust Local Features for Audio Fingerprinting

This paper presents a novel local audio fingerprint called MASK (Masked Audio Spectral Keypoints) that can effectively encode the acoustic information existent in audio documents and discriminate between transformed versions of the same acoustic documents and other unrelated documents. The fingerprint has been designed to be resilient to strong transformations of the original signal and to be usable for generic audio, including music and speech. Its main characteristics are its locality, binary encoding, robustness and compactness. The proposed audio fingerprint encodes the local spectral energies around salient points selected among the main spectral peaks in a given signal. Such encoding is done by centering on each point a carefully designed mask defining regions of the spectrogram whose average energies are compared with each other. From each comparison we obtain a single bit depending on which region has more energy, and group all bits into a final binary fingerprint. In addition, the fingerprint also stores the frequency of each peak, quantized using a Mel filterbank. The length of the fingerprint is solely defined by the number of compared regions being used, and can be adapted to the requirements of any particular application. In addition, the number of salient points encoded per second can be also easily modified. In the experimental section we show the suitability of such fingerprint to find matching segments by using the NIST-TRECVID benchmarking evaluation datasets by comparing it with a well known fingerprint, obtaining up to 26% relative improvement in NDCR score.

[1]  A. Aydin Alatan,et al.  Content Based Copy Detection with Coarse Audio-Visual Fingerprints , 2009, 2009 Seventh International Workshop on Content-Based Multimedia Indexing.

[2]  Shumeet Baluja,et al.  Audio Fingerprinting: Combining Computer Vision & Data Stream Processing , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Frank Hartung,et al.  Multimedia watermarking techniques , 1999, Proc. IEEE.

[4]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[5]  Pedro Cano,et al.  A review of algorithms for audio fingerprinting , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[6]  Patrick Cardinal,et al.  Content-based video copy detection using nearest-neighbor mapping , 2010, 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA).

[7]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System , 2002, ISMIR.

[8]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System With an Efficient Search Strategy , 2003 .

[9]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[10]  Nuria Oliver,et al.  Telefonica Research at TRECVID 2010 Content-Based Copy Detection , 2010, TRECVID.