Local Wavelet Acoustic Pattern: A Novel Time–Frequency Descriptor for Birdsong Recognition

Investigating the identity, distribution, and evolution of bird species is important for both biodiversity assessment and environmental conservation. The discrete wavelet transform (DWT) has been widely exploited to extract time–frequency features for acoustic signal analysis. Traditional approaches usually compute statistical measures (e.g., maximum, mean, standard deviation) of the DWT coefficients in each subband independently to yield the feature descriptor, without considering the intersubband correlation. A new acoustic descriptor, called the local wavelet acoustic pattern (LWAP), is proposed to characterize the correlation of the DWT coefficients in different subbands for birdsong recognition. First, we divide a variable-length birdsong segment into a number of fixed-duration texture windows. For each texture window, several LWAP descriptors are extracted. The vector of locally aggregated descriptors (VLAD) is then used to aggregate the set of LWAP descriptors into a single VLAD vector. Finally, principal component analysis (PCA) plus linear discriminant analysis (LDA) are employed to reduce the feature dimensionality for classification purposes. Experiments on two birdsong datasets show that the proposed LWAP descriptor outperforms other local descriptors, including linear predictive coding cepstral coefficients, Mel-frequency cepstral coefficients, perceptual linear prediction cepstral coefficients, chroma features, and prosody features. Furthermore, the proposed LWAP descriptor, followed by VLAD encoding, PCA plus LDA feature extraction, and a simple distance-based classifier, yields promising results that are competitive with those obtained by the state-of-the-art convolutional neural networks.

[1]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[3]  Shiv Ram Dubey,et al.  Local Wavelet Pattern: A New Feature Descriptor for Image Retrieval in Medical CT Databases , 2015, IEEE Transactions on Image Processing.

[4]  Amal Punchihewa,et al.  Birdsong Denoising Using Wavelets , 2016, PloS one.

[5]  Aki Härmä Automatic identification of bird species based on sinusoidal modeling of syllables , 2003, ICASSP.

[6]  Chang-Hsing Lee,et al.  Continuous Birdsong Recognition Using Gaussian Mixture Modeling of Image Shape Features , 2013, IEEE Transactions on Multimedia.

[7]  Chang-Hsing Lee,et al.  Automatic Recognition of Bird Songs Using Cepstral Coefficients , 2006 .

[8]  Panu Somervuo,et al.  Parametric Representations of Bird Sounds for Automatic Species Recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Grigorios Tsoumakas,et al.  A Comprehensive Study Over VLAD and Product Quantization in Large-Scale Image Retrieval , 2014, IEEE Transactions on Multimedia.

[11]  Mohsin M. Jamali,et al.  A novel feature extraction algorithm for classification of bird flight calls , 2012, 2012 IEEE International Symposium on Circuits and Systems.

[12]  Ying Li,et al.  Automatic Recognition of Bird Songs Using Time-Frequency Texture , 2013, 2013 5th International Conference on Computational Intelligence and Communication Networks.

[13]  J A Kogan,et al.  Automated recognition of bird song elements from continuous recordings using dynamic time warping and hidden Markov models: a comparative study. , 1998, The Journal of the Acoustical Society of America.

[14]  N. C. Singh,et al.  Modulation spectra of natural sounds and ethological theories of auditory processing. , 2003, The Journal of the Acoustical Society of America.

[15]  Chin-Chuan Han,et al.  Automatic Classification of Bird Species From Their Sounds Using Two-Dimensional Cepstral Coefficients , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[16]  Theodore A. Parker,et al.  On the Use of Tape Recorders in Avifaunal Surveys , 1991 .

[17]  Dan Stowell,et al.  Automatic large-scale classification of bird sounds is strongly improved by unsupervised feature learning , 2014, PeerJ.

[18]  Fabien Ringeval,et al.  Bird sounds classification by large scale acoustic features and extreme learning machine , 2015, 2015 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[19]  Zhixin Chen,et al.  Semi-automatic classification of bird vocalizations using spectral peak tracks. , 2006, The Journal of the Acoustical Society of America.

[20]  Steven Li,et al.  Automating identification of avian vocalizations using time-frequency information extracted from the Gabor transform. , 2012, The Journal of the Acoustical Society of America.

[21]  Xudong Jiang,et al.  Sound-Event Classification Using Robust Texture Features for Robot Hearing , 2017, IEEE Transactions on Multimedia.

[22]  Thomas Pellegrini,et al.  Densely connected CNNs for bird audio detection , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[23]  Francesc Alías,et al.  Gammatone Cepstral Coefficients: Biologically Inspired Features for Non-Speech Audio Classification , 2012, IEEE Transactions on Multimedia.

[24]  Juha T. Tanttu,et al.  Wavelets in Recognition of Bird Sounds , 2007, EURASIP J. Adv. Signal Process..

[25]  Douglas A. Reynolds,et al.  Experimental evaluation of features for robust speaker identification , 1994, IEEE Trans. Speech Audio Process..

[26]  Xiaoli Z. Fern,et al.  A Syllable-Level Probabilistic Framework for Bird Species Identification , 2009, 2009 International Conference on Machine Learning and Applications.

[27]  Paul Roe,et al.  Adaptive frequency scaled wavelet packet decomposition for frog call classification , 2016, Ecol. Informatics.

[28]  D Margoliash,et al.  Template-based automatic recognition of birdsong syllables from continuous recordings. , 1996, The Journal of the Acoustical Society of America.

[29]  H. C. Card,et al.  Birdsong recognition with DSP and neural networks , 1995, IEEE WESCANEX 95. Communications, Power, and Computing. Conference Proceedings.

[30]  Panu Somervuo,et al.  Bird song recognition based on syllable pair histograms , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  Xiaoli Z. Fern,et al.  Audio Classification of Bird Species: A Statistical Manifold Approach , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[32]  Gert R. G. Lanckriet,et al.  Codebook-Based Audio Feature Representation for Music Information Retrieval , 2013, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[33]  Charles E. Taylor,et al.  Unsupervised Acoustic Classification of Bird Species Using Hierarchical Self-organizing Maps , 2007, ACAL.

[34]  Qi Tian,et al.  Image Retargeting for Preserving Robust Local Feature: Application to Mobile Visual Search , 2016, IEEE Transactions on Multimedia.

[35]  Chia-Feng Juang,et al.  Birdsong recognition using prediction-based recurrent neural fuzzy networks , 2007, Neurocomputing.

[36]  Chang-Hsing Lee,et al.  Automatic Recognition of Birdsongs Using Mel-frequency Cepstral Coefficients and Vector Quantization , 2006, IMECS.

[37]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[38]  Hervé Glotin,et al.  Audio Bird Classification with Inception-v4 extended with Time and Time-Frequency Attention Mechanisms , 2017, CLEF.

[39]  Jian Yang,et al.  Why can LDA be performed in PCA transformed space? , 2003, Pattern Recognit..

[40]  Thomas Hofmann,et al.  Audio Based Bird Species Identification using Deep Learning Techniques , 2016, CLEF.

[41]  Seppo Ilmari Fagerlund,et al.  Bird Species Recognition Using Support Vector Machines , 2007, EURASIP J. Adv. Signal Process..

[42]  Tuomas Virtanen,et al.  Stacked convolutional and recurrent neural networks for bird audio detection , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[43]  Hua-An Zhao,et al.  Wavelet transform digital sound processing to identify wild bird species , 2013, 2013 International Conference on Wavelet Analysis and Pattern Recognition.

[44]  H. C. Card,et al.  Birdsong recognition using backpropagation and multivariate statistics , 1997, IEEE Trans. Signal Process..

[45]  Tao Chen,et al.  Discriminative Soft Bag-of-Visual Phrase for Mobile Landmark Recognition , 2014, IEEE Transactions on Multimedia.

[46]  Björn Schuller,et al.  openSMILE:): the Munich open-source large-scale multimedia feature extractor , 2015, ACMMR.

[47]  Myung Jong Kim,et al.  Audio-Based Objectionable Content Detection Using Discriminative Transforms of Time-Frequency Dynamics , 2012, IEEE Transactions on Multimedia.

[48]  Anil Prabhakar,et al.  Automatic identification of bird calls using Spectral Ensemble Average Voice Prints , 2006, 2006 14th European Signal Processing Conference.

[49]  Mark D. Plumbley,et al.  Birdsong and C4DM: A survey of UK birdsong and machine recognition for music researchers , 2011 .

[50]  François Pachet,et al.  Representing Musical Genre: A State of the Art , 2003 .