Noise robust bird song detection using syllable pattern-based hidden Markov models

In this paper, temporal, spectral, and structural characteristics of Robin songs and syllables are studied. Syllables in Robin songs are clustered by comparing a distance measure defined as the average of aligned LPC-based frame level differences. The syllable patterns inferred from the clustering results are used for improving the acoustic modelling of a hidden Markov model (HMM)-based Robin song detector. Experiments conducted on a noisy Rocky Mountain Biological Laboratory Robin (RMBL-Robin) song corpus with more than 75 minutes of recordings show that the syllable pattern-based detector has a higher hit rate while maintaining a lower false alarm rate, compared to the detector with a general model trained from all the syllables.

[1]  Charles E Taylor,et al.  Automated species recognition of antbirds in a Mexican rainforest using hidden Markov models. , 2008, The Journal of the Acoustical Society of America.

[2]  P. Marler A comparative approach to vocal learning: Song development in white-crowned sparrows. , 1970 .

[3]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[4]  T. S. Brandes,et al.  Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  W. Russell,et al.  Continuous hidden Markov modeling for speaker-independent word spotting , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[6]  S. Levinson,et al.  Considerations in dynamic time warping algorithms for discrete word recognition , 1978 .

[7]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[8]  Howard C. Card,et al.  Bird song identification using artificial neural networks and statistical analysis , 1997, CCECE '97. Canadian Conference on Electrical and Computer Engineering. Engineering Innovation: Voyage of Discovery. Conference Proceedings.

[9]  F Goller,et al.  The neuromuscular control of birdsong. , 1999, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[10]  H. C. Card,et al.  Birdsong recognition using backpropagation and multivariate statistics , 1997, IEEE Trans. Signal Process..

[11]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[12]  Louis Ranjard,et al.  Unsupervised bird song syllable classification using evolving neural networks. , 2008, The Journal of the Acoustical Society of America.

[13]  Aaron E. Rosenberg,et al.  An investigation of the use of dynamic time warping for word spotting and connected speech recognition , 1980, ICASSP.

[14]  Abeer Alwan,et al.  On the use of variable frame rate analysis in speech recognition , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[15]  P. Slater,et al.  Bird Song: Biological Themes and Variations , 1995 .

[16]  J A Kogan,et al.  Automated recognition of bird song elements from continuous recordings using dynamic time warping and hidden Markov models: a comparative study. , 1998, The Journal of the Acoustical Society of America.