Automatic Recognition of an Unknown and Time-Varying Number of Simultaneous Environmental Sound Sources

The present work faces the problem of automatic enumeration and recognition of an unknown and time-varying number of environmental sound sources while using a single microphone. The assumption that is made is that the sound recorded is a realization of sound sources belonging to a group of audio classes which is known a-priori. We describe two variations of the same principle which is to calculate the distance between the current unknown audio frame and all possible combinations of the classes that are assumed to span the soundscene. We concentrate on categorizing environmental sound sources, such as birds, insects etc. in the task of monitoring the biodiversity of a specific habitat. Keywords—automatic recognition of multiple sound sources, enumeration of sound sources, computational ecology.

[1]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Daniel P. W. Ellis,et al.  Fingerprinting to Identify Repeated Sound Events in Long-Duration Personal Audio Recordings , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  M. Y. Siyal,et al.  Blind source separation of audio signals using improved ICA method , 2001, Proceedings of the 11th IEEE Signal Processing Workshop on Statistical Signal Processing (Cat. No.01TH8563).

[4]  Jürgen Herre,et al.  Robust matching of audio signals using spectral flatness features , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[5]  Regunathan Radhakrishnan,et al.  Systematic acquisition of audio classes for elevator surveillance , 2005, IS&T/SPIE Electronic Imaging.

[6]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[7]  Jürgen Herre,et al.  AudioID: Towards Content-Based Identification of Audio Material , 2001 .

[8]  Ilyas Potamitis Single channel enumeration and recognition of an unknown and time-varying number of sound sources , 2008, 2008 16th European Signal Processing Conference.

[9]  Renate Sitte,et al.  Comparison of techniques for environmental sound recognition , 2003, Pattern Recognit. Lett..

[10]  Li Deng,et al.  Estimating cepstrum of speech under the presence of noise using a joint prior of static and dynamic features , 2004, IEEE Transactions on Speech and Audio Processing.