An efficient method for the unsupervised discovery of signalling motifs in large audio streams

Providing effective tools to navigate and access through long audio archives, or monitor and classify broadcast streams, proves to be an extremely challenging task. Main issues originate from the varied nature of patterns of interest in a composite audio environment, the massive size of such databases, and the capability of performing when prior knowledge on audio content is scarce or absent. This paper proposes a computational architecture aimed at discovering occurrences of repeating patterns in audio streams by means of unsupervised learning. The targeted repetitions (or motifs) are called signalling, by analogy with a biological nomenclature, as referring to a broad class of audio patterns (as jingles, songs, advertisements, etc…) frequently occurring in broadcast audio. We adapt a system originally developed for word discovery applications, and demonstrate its effectiveness in a song discovery scenario. The adaption consists in speeding up critical parts of the computations, mostly based on audio feature coarsening, to deal with the large occurrence period of repeating songs in radio streams.

[1]  Daniel P. W. Ellis,et al.  Fingerprinting to Identify Repeated Sound Events in Long-Duration Personal Audio Recordings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[2]  Frédéric Bimbot,et al.  Audio keyword extraction by unsupervised word discovery , 2009, INTERSPEECH.

[3]  Avery Wang,et al.  An Industrial Strength Audio Search Algorithm , 2003, ISMIR.

[4]  Cormac Herley,et al.  ARGOS: automatically extracting repeating objects from multimedia streams , 2006, IEEE Transactions on Multimedia.

[5]  Latifur Khan,et al.  Multimedia Data Mining and Knowledge Discovery , 2006 .