Dynamic time warping and sparse representation classification for birdsong phrase classification using limited training data.

Annotation of phrases in birdsongs can be helpful to behavioral and population studies. To reduce the need for manual annotation, an automated birdsong phrase classification algorithm for limited data is developed. Limited data occur because of limited recordings or the existence of rare phrases. In this paper, classification of up to 81 phrase classes of Cassin's Vireo is performed using one to five training samples per class. The algorithm involves dynamic time warping (DTW) and two passes of sparse representation (SR) classification. DTW improves the similarity between training and test phrases from the same class in the presence of individual bird differences and phrase segmentation inconsistencies. The SR classifier works by finding a sparse linear combination of training feature vectors from all classes that best approximates the test feature vector. When the class decisions from DTW and the first pass SR classification are different, SR classification is repeated using training samples from these two conflicting classes. Compared to DTW, support vector machines, and an SR classifier without DTW, the proposed classifier achieves the highest classification accuracies of 94% and 89% on manually segmented and automatically segmented phrases, respectively, from unseen Cassin's Vireo individuals, using five training samples per class.

[1]  C Daniel Meliza,et al.  Pitch- and spectral-based dynamic time warping methods for comparing field recordings of harmonic avian vocalizations. , 2013, The Journal of the Acoustical Society of America.

[2]  D Margoliash,et al.  Template-based automatic recognition of birdsong syllables from continuous recordings. , 1996, The Journal of the Acoustical Society of America.

[3]  Charles E Taylor,et al.  Automated species recognition of antbirds in a Mexican rainforest using hidden Markov models. , 2008, The Journal of the Acoustical Society of America.

[4]  Tuomas Virtanen,et al.  Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Thierry Aubin,et al.  Does true syntax or simple auditory object support the role of skylark song dialect? , 2013, Animal Behaviour.

[6]  Bo Zhang,et al.  Support vector machine learning for image retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[7]  Wei Chu,et al.  Noise robust bird song detection using syllable pattern-based hidden Markov models , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Hugo Van hamme,et al.  Embedding time warping in exemplar-based sparse representations of speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Louis Ranjard,et al.  Unsupervised bird song syllable classification using evolving neural networks. , 2008, The Journal of the Acoustical Society of America.

[10]  Seppo Ilmari Fagerlund,et al.  Bird Species Recognition Using Support Vector Machines , 2007, EURASIP J. Adv. Signal Process..

[11]  Charles E. Taylor,et al.  Structural Design Principles of Complex Bird Songs: A Network-Based Approach , 2012, PloS one.

[12]  R. Berwick,et al.  Songs to syntax: the linguistics of birdsong , 2011, Trends in Cognitive Sciences.

[13]  Ying Li,et al.  Automatic Recognition of Bird Songs Using Time-Frequency Texture , 2013, 2013 5th International Conference on Computational Intelligence and Communication Networks.

[14]  Naoya Oosugi,et al.  Semi-Automatic Classification of Birdsong Elements Using a Linear Support Vector Machine , 2014, PloS one.

[15]  J A Kogan,et al.  Automated recognition of bird song elements from continuous recordings using dynamic time warping and hidden Markov models: a comparative study. , 1998, The Journal of the Acoustical Society of America.

[16]  Aaron E. Rosenberg,et al.  Performance tradeoffs in dynamic time warping algorithms for isolated word recognition , 1980 .

[17]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[18]  Andreas Stolcke,et al.  Bird species recognition combining acoustic and sequence modeling , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Zhixin Chen,et al.  Semi-automatic classification of bird vocalizations using spectral peak tracks. , 2006, The Journal of the Acoustical Society of America.

[20]  J. M. Jenkins,et al.  The appropriate use of Zipf's law in animal communication studies , 2005, Animal Behaviour.

[21]  Chang-Hsing Lee,et al.  Automatic Recognition of Bird Songs Using Cepstral Coefficients , 2006 .

[22]  T. Scott Brandes,et al.  Automated sound recording and analysis techniques for bird surveys and conservation , 2008, Bird Conservation International.

[23]  Abeer Alwan,et al.  A robust automatic bird phrase classifier using dynamic time-warping with prominent region identification , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Z. K. Silagadze,et al.  Citations and the Zipf-Mandelbrot Law , 1999, Complex Syst..

[25]  Allen Y. Yang,et al.  Feature Selection in Face Recognition: A Sparse Representation Perspective , 2007 .

[26]  Chang-Hsing Lee,et al.  Continuous Birdsong Recognition Using Gaussian Mixture Modeling of Image Shape Features , 2013, IEEE Transactions on Multimedia.

[27]  Michael P. Friedlander,et al.  Probing the Pareto Frontier for Basis Pursuit Solutions , 2008, SIAM J. Sci. Comput..

[28]  P. J. B. Slater,et al.  Bird Song: THE STUDY OF BIRD SONG , 2008 .

[29]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Abeer Alwan,et al.  A sparse representation-based classifier for in-set bird phrase verification and classification with limited training data , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[31]  R. H. Wiley,et al.  Reverberations and Amplitude Fluctuations in the Propagation of Sound in a Forest: Implications for Animal Communication , 1980, The American Naturalist.

[32]  Ian Agranat,et al.  IDENTIFYING ANIMAL SPECIES FROM THEIR VOCALIZATIONS , 2009 .

[33]  Charles E. Taylor,et al.  Territorial dynamics of Mexican Ant‐thrushes Formicarius moniliger revealed by individual recognition of their songs , 2011 .

[34]  D. Mennill Individual distinctiveness in avian vocalizations and the spatial monitoring of behaviour , 2011 .

[35]  Torben Dabelsteen,et al.  Rainforests as concert halls for birds: are reverberations improving sound transmission of long song elements? , 2006, The Journal of the Acoustical Society of America.

[36]  Dan Stowell,et al.  Improved multiple birdsong tracking with distribution derivative method and Markov renewal process clustering , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[37]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[38]  Luis J. Villanueva-Rivera,et al.  Soundscape Ecology: The Science of Sound in the Landscape , 2011 .

[39]  Abeer Alwan,et al.  Evaluation of a Sparse Representation-Based Classifier For Bird Phrase Classification Under Limited Data Conditions , 2012, INTERSPEECH.

[40]  Xiaoli Z. Fern,et al.  Time-frequency segmentation of bird song in noisy acoustic environments , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).