Detection of repeating items in audio streams using data-driven ALISP sequencing

Radio streams often contain redundant parts. Commercials on radio or television stations, songs on music channels and jingles broadcasted before a specific radio or TV show, are some of the repeating objects in multimedia streams. In this paper, an audio fingerprinting system to detect repeating objects in audio streams is proposed. In order to resolve this problem, the ARGOS segmentation framework is used. This framework is combined with the ALISP-based audio fingerprinting system to build a new audio motif detection system. An approximate string matching algorithm inspired from BLAST technique is applied to speed up the approximate string matching to find the repeating items in the audio streams. Most of the audio motif discovery systems proposed in the literature are evaluated on repeating songs with long duration (about 5min). In our case, the ALISP-based system is evaluated on advertisements and songs where the duration could vary from few seconds to some minutes. The system is evaluated on 21 days from 3 French radio stations. On a set of 3081 repeating songs and 1315 repeating advertisements a mean recall rate of 98% with the corresponding precision value of 99% were achieved. The results show that the system is robust against different kinds of distortions present in radio streams.

[1]  G. H. Wakefield,et al.  To catch a chorus: using chroma-based representations for audio thumbnailing , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[2]  Matija Marolt Transcription of polyphonic piano music with neural networks , 2000, 2000 10th Mediterranean Electrotechnical Conference. Information Technology and Electrotechnology for the Mediterranean Countries. Proceedings. MeleCon 2000 (Cat. No.00CH37099).

[3]  Gérard Chollet,et al.  A Generic Audio Identification System for Radio Broadcast Monitoring Based on Data-Driven Segmentation , 2012, 2012 IEEE International Symposium on Multimedia.

[4]  Gérard Chollet,et al.  Speaker diarization using data-driven audio sequencing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[6]  Henrique S. Malvar,et al.  Using audio fingerprinting for duplicate detection and thumbnail generation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  Avery Wang,et al.  The Shazam music recognition service , 2006, CACM.

[8]  Roger B. Dannenberg,et al.  Listening to "Naima": An Automated Structural Analysis of Music from Recorded Audio , 2002, ICMC.

[9]  Gérard Chollet,et al.  Data-driven speech segmentation for language identification and speaker verification , 2003, NOLISP.

[10]  Gaël Richard,et al.  A framework for fingerprint-based detection of repeating objects in multimedia streams , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[11]  Gaël Richard,et al.  A Scalable Audio Fingerprint Method with Robustness to Pitch-Shifting , 2011, ISMIR.

[12]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[13]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[14]  Gérard Chollet,et al.  Toward ALISP: A proposal for Automatic Language Independent Speech Processing , 1999 .

[15]  Helmut Neuschmied,et al.  Robust Sound Modeling for Song Detection in Broadcast Audio , 2002 .

[16]  Thomas Fillon,et al.  A PUBLIC AUDIO IDENTIFICATION EVALUATION FRAMEWORK FOR BROADCAST MONITORING , 2012, Appl. Artif. Intell..

[17]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[18]  Gérard Chollet,et al.  Text-independent Speaker Verification , 2009 .

[19]  Ning Hu,et al.  Pattern Discovery Techniques for Music Audio , 2002, ISMIR.

[20]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[21]  Daniel P. W. Ellis,et al.  Fingerprinting to Identify Repeated Sound Events in Long-Duration Personal Audio Recordings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[22]  Cormac Herley,et al.  ARGOS: automatically extracting repeating objects from multimedia streams , 2006, IEEE Transactions on Multimedia.

[23]  Frédéric Bimbot,et al.  An efficient method for the unsupervised discovery of signalling motifs in large audio streams , 2011, 2011 9th International Workshop on Content-Based Multimedia Indexing (CBMI).

[24]  Gérard Chollet,et al.  Detection of nonlinguistic vocalizations using ALISP sequencing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  A. Sinitsyn,et al.  Duplicate Song Detection using Audio Fingerprinting for Consumer Electronics Devices , 2006, 2006 IEEE International Symposium on Consumer Electronics.