MARSYAS: a framework for audio analysis

Existing audio tools handle the increasing amount of computer audio data inadequately. The typical tape-recorder paradigm for audio interfaces is inflexible and time consuming, especially for large data sets. On the other hand, completely automatic audio analysis and annotation is impossible using current techniques. Alternative solutions are semi-automatic user interfaces that let users interact with sound in flexible ways based on content. This approach offers significant advantages over manual browsing, annotation and retrieval. Furthermore, it can be implemented using existing techniques for audio content analysis in restricted domains. This paper describes MARSYAS, a framework for experimenting, evaluating and integrating such techniques. As a test for the architecture, some recently proposed techniques have been implemented and tested. In addition, a new method for temporal segmentation based on audio texture is described. This method is combined with audio analysis techniques and used for hierarchical browsing, classification and annotation of audio files.

[1]  Paul Mermelstein,et al.  Experiments in syllable-based recognition of continuous speech , 1980, ICASSP.

[2]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Eric D. Scheirer,et al.  Music Content Analysis through Models of Audition , 1998 .

[4]  Barry Arons,et al.  SpeechSkimmer: a system for interactively skimming recorded speech , 1997, TCHI.

[5]  Richard F. Lyon,et al.  A perceptual pitch detector , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[6]  George Tzanetakis,et al.  A framework for audio analysis based on classification and temporal segmentation , 1999, Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium.

[7]  Jonathan Foote,et al.  An overview of audio information retrieval , 1999, Multimedia Systems.

[8]  Xavier Rodet,et al.  Features extraction and temporal segmentation of acoustic signals , 1998, ICMC.

[9]  George Tzanetakis,et al.  Experiments in computer-assisted annotation of audio , 2000 .

[10]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[11]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[12]  Don Kimber,et al.  Acoustic Segmentation for Audio Browsers , 1997 .

[13]  M. J. Cheng,et al.  Comparative performance study of several pitch detection algorithms , 1975 .

[14]  Keith D. Martin,et al.  TOWARD AUTOMATIC SOUND SOURCE RECOGNITION: IDENTIFYING MUSICAL INSTRUMENTS , 1998 .

[15]  Richard F. Lyon,et al.  On the importance of time—a temporal representation of sound , 1993 .

[16]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[17]  Douglas Keislar,et al.  Content-Based Classification, Search, and Retrieval of Audio , 1996, IEEE Multim..

[18]  John S. Boreczky,et al.  A hidden Markov model framework for video segmentation using audio and image features , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[19]  Eric D. Scheirer,et al.  Tempo and beat analysis of acoustic musical signals. , 1998, The Journal of the Acoustical Society of America.

[20]  Daniel Patrick Whittlesey Ellis,et al.  Prediction-driven computational auditory scene analysis , 1996 .

[21]  Malcolm Slaney,et al.  A critique of pure audition , 1998 .

[22]  Alexander G. Hauptmann,et al.  Informedia: news-on-demand multimedia information acquisition and retrieval , 1997 .

[23]  George Tzanetakis,et al.  Multifeature audio segmentation for browsing and annotation , 1999, Proceedings of the 1999 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics. WASPAA'99 (Cat. No.99TH8452).

[24]  Ichiro Fujinaga,et al.  Machine recognition of timbre using steady-state tone of acoustic musical instruments , 1998, ICMC.