ALISP-Based Data Compression for Generic Audio Indexing

In this paper we propose a generic framework to index and retrieve audio. In this framework, audio data is transformed into a sequence of symbols using the ALISP tools. In such a way the audio data is represented in a compact way. Then an approximate matching algorithm inspired from the BLAST technique is exploited to retrieve the majority of audio items that could be present in radio stream. The evaluations of the proposed systems are done on a private radio broadcast database provided by YACAST and other publicly available corpora. The experimental results show an excellent performance in audio identification (for advertisement and songs), audio motif discovery (for advertisement and songs), speaker diarization and laughter detection. Moreover, the ALISP-based system has obtained the best results in ETAPE 2011 (Evaluations en Treatment Automatique de la Parole) evaluation campaign for the speaker diarization task.

[1]  Helmut Neuschmied,et al.  Robust Sound Modeling for Song Detection in Broadcast Audio , 2002 .

[2]  Gérard Chollet,et al.  Automatic detection of known advertisements in radio broadcast with data-driven ALISP transcriptions , 2011, Multimedia Tools and Applications.

[3]  Gérard Chollet,et al.  Segmental Approaches for Automatic Speaker Verification , 2000, Digit. Signal Process..

[4]  Ton Kalker,et al.  A Highly Robust Audio Fingerprinting System , 2002, ISMIR.

[5]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[6]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[7]  Gérard Chollet,et al.  Toward ALISP: A proposal for Automatic Language Independent Speech Processing , 1999 .

[8]  Gérard Chollet,et al.  Speaker diarization using data-driven audio sequencing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Gérard Chollet,et al.  Detection of nonlinguistic vocalizations using ALISP sequencing , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Gintaras Barisevi,et al.  TEXT-INDEPENDENT SPEAKER VERIFICATION , 2005 .

[11]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[12]  Thierry Dutoit,et al.  The AVLaughterCycle Database , 2010, LREC.

[13]  Olivier Galibert,et al.  The ETAPE corpus for the evaluation of speech-based TV content processing in the French language , 2012, LREC.

[14]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[15]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[16]  Maja Pantic,et al.  The MAHNOB Laughter database , 2013, Image Vis. Comput..

[17]  Nicholas W. D. Evans,et al.  Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Henrique S. Malvar,et al.  Using audio fingerprinting for duplicate detection and thumbnail generation , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[19]  Avery Wang,et al.  The Shazam music recognition service , 2006, CACM.

[20]  Maja Pantic,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING , 2022 .

[21]  Gérard Chollet,et al.  Voice forgery using ALISP: indexation in a client memory , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..