Automatic Birdsong Recognition with MFCC Based Syllable Feature Extraction

In this study, an automatic birdsong recognition system based on syllable features was developed. In this system, after syllable segmentation, three syllable features, namely mean, QI and QE, were computed from the MFCCs of each syllable aims at capturing variations in time as well as amplitude transitions of the MFCC sequences. With the advantages of the fuzzy c-mean (FCM) clustering algorithm and the linear discriminant analysis (LDA), the presented feature vector was used to construct an automatic birdsong recognition system applied to a birdsong database with 420 bird species.

[1]  Hsiao-Chuan Wang,et al.  On the use of weighted filter bank analysis for the derivation of robust MFCCs , 2001, IEEE Signal Process. Lett..

[2]  K.-C. Wang,et al.  Robust endpoint detection algorithm based on the adaptive band-partitioning spectral entropy in adverse environments , 2005, IEEE Transactions on Speech and Audio Processing.

[4]  Jeih-Weih Hung,et al.  Constructing Modulation Frequency Domain-Based Features for Robust Speech Recognition , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Howard C. Card,et al.  Bird song identification using artificial neural networks and statistical analysis , 1997, CCECE '97. Canadian Conference on Electrical and Computer Engineering. Engineering Innovation: Voyage of Discovery. Conference Proceedings.

[6]  L. P. Ricotti Multitapering and a wavelet variant of MFCC in speech recognition , 2005 .

[7]  Khaled Assaleh,et al.  A robust endpoint detection of speech for noisy environments with application to automatic speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  John G. Harris,et al.  Improving the filter bank of a classic speech feature extraction algorithm , 2003, Proceedings of the 2003 International Symposium on Circuits and Systems, 2003. ISCAS '03..

[9]  Lin-Shan Lee,et al.  Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[10]  Noureddine Zahid,et al.  A new cluster-validity for fuzzy clustering , 1999, Pattern Recognit..

[11]  Vincent M. Stanford,et al.  An Automated Acoustic System to Monitor and Classify Birds , 2006, EURASIP J. Adv. Signal Process..

[12]  JueBang Yu,et al.  A novel Chinese continuous speech endpoint detection method based on time domain features of the word structure , 2002, IEEE 2002 International Conference on Communications, Circuits and Systems and West Sino Expositions.

[13]  Panu Somervuo,et al.  Parametric Representations of Bird Sounds for Automatic Species Recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Chungyong Lee,et al.  Optimizing feature extraction for speech recognition , 2003, IEEE Trans. Speech Audio Process..

[15]  Sungyoung Lee,et al.  PCA-based human auditory filter bank for speech recognition , 2004, 2004 International Conference on Signal Processing and Communications, 2004. SPCOM '04..

[16]  Lawrence R. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1975, Bell Syst. Tech. J..

[17]  John G. Harris,et al.  Increased mfcc filter bandwidth for noise-robust phoneme recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[18]  John H. L. Hansen,et al.  A comparative study of traditional and newly proposed features for recognition of speech under stress , 2000, IEEE Trans. Speech Audio Process..

[19]  Tetsuya Takiguchi,et al.  Robust Feature Extraction using Kernel PCA , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[20]  David G. Stork,et al.  Pattern Classification , 1973 .

[21]  Tom E. Bishop,et al.  Blind Image Restoration Using a Block-Stationary Signal Model , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.