Audio recognition in the wild: Static and dynamic classification on a real-world database of animal vocalizations

We present a study on purely data-based recognition of animal sounds, performing evaluation on a real-world database obtained from the Humboldt-University Animal Sound Archive. As we avoid a preselection of friendly cases, the challenge for the classifiers is to discriminate between species regardless of the age or stance of the animal. We define classification tasks that can be useful for information retrieval and indexing, facilitating categorization of large sound archives. On these tasks, we compare dynamic and static classification by left-right and cyclic Hidden Markov Models, recurrent neural networks with Long Short-Term Memory, and Support Vector Machines, as well as different features commonly found in sound classification and speech recognition, achieving up to 81.3% accuracy on a 2-class, and 64.0% on a 5-class task.

[1]  S. Gunasekaran,et al.  Content-Based Classification and Retrieval of Wild Animal Sounds Using Feature Selection Algorithm , 2010, 2010 Second International Conference on Machine Learning and Computing.

[2]  Björn W. Schuller,et al.  Universal Onset Detection with Bidirectional Long Short-Term Memory Neural Networks , 2010, ISMIR.

[3]  Thomas Sikora,et al.  How Efficient is MPEG-7 for General Sound Recognition? , 2004 .

[4]  D K Mellinger,et al.  Recognizing transient low-frequency whale sounds by spectrogram correlation. , 2000, The Journal of the Acoustical Society of America.

[5]  Rolf Bardeli,et al.  Similarity Search in Animal Sound Databases , 2009, IEEE Transactions on Multimedia.

[6]  Björn W. Schuller,et al.  Combining Long Short-Term Memory and Dynamic Bayesian Networks for Incremental Emotion-Sensitive Artificial Listening , 2010, IEEE Journal of Selected Topics in Signal Processing.

[7]  Guodong Guo,et al.  Content-based audio classification and retrieval by support vector machines , 2003, IEEE Trans. Neural Networks.

[8]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[9]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[10]  Michael Clausen,et al.  The animal sound archive at the Humboldt-University of Berlin: current activities in conservation and improving access for bioacoustic research , 2006 .

[11]  Aki Härmä Automatic identification of bird species based on sinusoidal modeling of syllables , 2003, ICASSP.

[12]  Christian Breiteneder,et al.  Discrimination and retrieval of animal sounds , 2006, 2006 12th International Multi-Media Modelling Conference.

[13]  Björn W. Schuller,et al.  Recognising realistic emotions and affect in speech: State of the art and lessons learnt from the first challenge , 2011, Speech Commun..