Automatic detection of bird species from audio field recordings using HMM-based modelling of frequency tracks

This paper presents an automatic system for detection of bird species in field recordings. A sinusoidal detection algorithm is employed to segment the acoustic scene into isolated spectro-temporal segments. Each segment is represented as a temporal sequence of frequencies of the detected sinusoid, referred to as frequency track. Each bird species is represented by a set of hidden Markov models (HMMs), each HMM modelling an individual type of bird vocalisation element. These HMMs are obtained in an unsupervised manner. The detection is based on a likelihood ratio of the test utterance against the target bird species and non-target background model. We explore on selection of cohort for modelling the background model, z-norm and t-norm score normalisation techniques and score compensation to deal with outlier data. Experiments are performed using over 40 hours of audio field recordings from 48 bird species plus an additional 16 hours of field recordings as impostor trials. Evaluations are performed using detection error trade-off plots. The equal error rate of 5% is achieved when impostor trials are non-target bird species vocalisations and 1.2% when using field recordings which do not contain bird vocalisations.

[1]  T. S. Brandes,et al.  Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Martin J. Russell,et al.  Unsupervised discovery of acoustic patterns in bird vocalisations employing DTW and clustering , 2013, 21st European Signal Processing Conference (EUSIPCO 2013).

[3]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[4]  Panu Somervuo,et al.  Parametric Representations of Bird Sounds for Automatic Species Recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Olaf Jahn,et al.  Automated Sound Recognition Provides Insights into the Behavioral Ecology of a Tropical Bird , 2017, PloS one.

[6]  Martin J. Russell,et al.  HMM-based modelling of individual syllables for bird species recognition from audio field recordings , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Fionn Murtagh,et al.  Reliability-based estimation of the number of noisy features: application to model-order selection in the union models , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[9]  Jean-Luc Gauvain,et al.  Speaker verification over the telephone , 2000, Speech Commun..

[10]  Wei Chu,et al.  Noise robust bird song detection using syllable pattern-based hidden Markov models , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Peter Jancovic,et al.  Acoustic Recognition of Multiple Bird Species Based on Penalized Maximum Likelihood , 2015, IEEE Signal Processing Letters.

[12]  Jason R Heller,et al.  Automatic recognition of harmonic bird sounds using a frequency track extraction algorithm. , 2008, The Journal of the Acoustical Society of America.

[13]  Martin J. Russell,et al.  Bird species recognition from field recordings using HMM-based modelling of frequency tracks , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[14]  Xiaoli Z. Fern,et al.  Acoustic classification of multiple simultaneous bird species: a multi-instance multi-label approach. , 2012, The Journal of the Acoustical Society of America.

[15]  Frank Kurth,et al.  Detecting bird sounds in a complex acoustic environment and application to bioacoustic monitoring , 2010, Pattern Recognit. Lett..

[16]  Michael Towsey,et al.  A practical comparison of manual and autonomous methods for acoustic monitoring , 2013 .

[17]  Peter Jancovic,et al.  Detection of sinusoidal signals in noise by probabilistic modelling of the spectral magnitude shape and phase continuity , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Peter Jancovic,et al.  Automatic Detection and Recognition of Tonal Bird Sounds in Noisy Environments , 2011, EURASIP J. Adv. Signal Process..

[19]  Peter Jan DETECTION OF SINUSOIDAL SIGNALS IN NOISE BY PROBABILISTIC MODELLING OF THE SPECTRAL MAGNITUDE SHAPE AND PHASE CONTINUITY , 2011 .

[20]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[21]  Dan Stowell,et al.  An Open Dataset for Research on Audio Field Recording Archives: freefield1010 , 2013, Semantic Audio.

[22]  Mario Lasseck,et al.  Improved Automatic Bird Identification through Decision Tree based Feature Selection and Bagging , 2015, CLEF.

[23]  Martin J. Russell,et al.  Bird species recognition using HMM-based unsupervised modelling of individual syllables with incorporated duration modelling , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).