Robust feature representation for classification of bird song syllables

A novel feature set for low-dimensional signal representation, designed for classification or clustering of non-stationary signals with complex variation in time and frequency, is presented. The feature representation of a signal is given by the first left and right singular vectors of its ambiguity spectrum matrix. If the ambiguity matrix is of low rank, most signal information in time direction is captured by the first right singular vector while the signal’s key frequency information is encoded by the first left singular vector. The resemblance of two signals is investigated by means of a suitable similarity assessment of the signals’ respective singular vector pair. Application of multitapers for the calculation of the ambiguity spectrum gives an increased robustness to jitter and background noise and a consequent improvement in performance, as compared to estimation based on the ordinary single Hanning window spectrogram. The suggested feature-based signal compression is applied to a syllable-based analysis of a song from the bird species Great Reed Warbler and evaluated by comparison to manual auditive and/or visual signal classification. The results show that the proposed approach outperforms well-known approaches based on mel-frequency cepstral coefficients and spectrogram cross-correlation.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  Maria Hansson-Sandsten,et al.  Optimal Multitaper Wigner Spectrum Estimation of a Class of Locally Stationary Processes Using Hermite Functions , 2010 .

[3]  Dennis Hasselquist,et al.  Correlation between male song repertoire, extra-pair paternity and offspring survival in the great reed warbler , 1996, Nature.

[4]  D. Thomson,et al.  Spectrum estimation and harmonic analysis , 1982, Proceedings of the IEEE.

[5]  Andrew Farnsworth,et al.  A comparison of similarity-based approaches in the classification of flight calls of four species of North American wood-warblers (Parulidae) , 2014, Ecol. Informatics.

[6]  Hanli Qiao,et al.  New SVD based initialization strategy for non-negative matrix factorization , 2014, Pattern Recognit. Lett..

[7]  Branka Jokanovic,et al.  Multi-window time–frequency signature reconstruction from undersampled continuous-wave radar measurements for fall detection , 2015 .

[8]  Roland Badeau,et al.  NMF With Time–Frequency Activations to Model Nonstationary Audio Events , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Boualem Boashash,et al.  Time-Frequency Signal Analysis and Processing: A Comprehensive Reference , 2015 .

[10]  Sridhar Krishnan,et al.  Time–Frequency Matrix Feature Extraction and Classification of Environmental Audio Signals , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Khaled H. Hamed,et al.  Time-frequency analysis , 2003 .

[12]  Ying Li,et al.  Adaptive energy detection for bird sound detection in complex environments , 2015, Neurocomputing.

[13]  Bengt Hansson,et al.  Automated analysis of song structure in complex birdsongs , 2016, Animal Behaviour.

[14]  Maria Hansson,et al.  SVD-based classification of bird singing in different time-frequency domains using multitapers , 2011, 2011 19th European Signal Processing Conference.

[15]  R. Barry,et al.  Event-related EEG time-frequency PCA and the orienting reflex to auditory stimuli. , 2015, Psychophysiology.

[16]  O Tchernichovski,et al.  Studying the Song Development Process: Rationale and Methods , 2004, Annals of the New York Academy of Sciences.

[17]  Kai Yu,et al.  Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Panu Somervuo,et al.  Parametric Representations of Bird Sounds for Automatic Species Recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Roland Badeau,et al.  NMF With Time-Frequency Activations to Model Nonstationary Audio Events , 2011, IEEE Trans. Speech Audio Process..

[20]  Irena Orovic,et al.  A new approach for classification of human gait based on time-frequency feature representations , 2011, Signal Process..

[21]  Boualem Boashash,et al.  Time-frequency features for pattern recognition using high-resolution TFDs: A tutorial review , 2015, Digit. Signal Process..

[22]  Unto K. Laine,et al.  New parametric representations of bird sounds for automatic classification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23]  Zhibin Yu,et al.  A novel generalized demodulation approach for multi-component signals , 2016, Signal Process..

[24]  Bruno Torrésani,et al.  Time-Frequency and Time-Scale Analysis , 1999 .

[25]  E. Wȩgrzyn,et al.  Syllable sharing and changes in syllable repertoire size and composition within and between years in the great reed warbler, Acrocephalus arundinaceus , 2010, Journal of Ornithology.

[26]  Dale Groutage,et al.  Feature sets for nonstationary signals derived from moments of the singular value decomposition of Cohen-Posch (positive time-frequency) distributions , 2000, IEEE Trans. Signal Process..

[27]  C Daniel Meliza,et al.  Pitch- and spectral-based dynamic time warping methods for comparing field recordings of harmonic avian vocalizations. , 2013, The Journal of the Acoustical Society of America.

[28]  Ingrid Daubechies,et al.  Time-frequency localization operators: A geometric phase space approach , 1988, IEEE Trans. Inf. Theory.

[29]  Maria Hansson-Sandsten,et al.  Classification of bird song syllables using singular vectors of the multitaper spectrogram , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[30]  Boualem Boashash,et al.  Time-Frequency Signal Analysis and Processing , 2002 .

[31]  E. Cramer Measuring consistency: spectrogram cross-correlation versus targeted acoustic parameters , 2013 .

[32]  Richard G. Baraniuk,et al.  Multiple window time-frequency analysis , 1996, Proceedings of Third International Symposium on Time-Frequency and Time-Scale Analysis (TFTS-96).

[33]  Joerg F. Hipp,et al.  Time-Frequency Analysis , 2014, Encyclopedia of Computational Neuroscience.

[34]  Maria Hansson,et al.  Kernels and Multiple Windows for Estimation of the Wigner-Ville Spectrum of Gaussian Locally Stationary Processes , 2007, IEEE Transactions on Signal Processing.