A Study of the Application of an Average Energy Entropy Method for the Endpoint Extraction of Frog Croak Syllables

【Summary】 Energy-based endpoint detection is commonly used in time domain analyses of speech segments of extracted signals to reduce the amount of computation required. However, this approach may extract incorrect speech segments due to interference by noise, which can significantly impair its recognition ability when analyzing sound files recorded in the wild. In contrast, entropy-based endpoint detection performs better in terms of noise suppression. Unfortunately, background noise that has a non-stationary frequency distribution causes drastic fluctuations in entropy values of silent segments, and weakens endpoint detection. This paper proposes using average energy entropy (AEE) endpoint detection to address these issues, and compares the AEE method with 3 other endpoint detection methods-energy-based, zero-crossing rate, and entropy-based detection methods. In experiments on frog voice-print recognition, 18 types of frog croaks recorded from the wild were analyzed, and the results revealed that the AEE method had the optimal endpoint extraction capability; and when used in concert with the linear predicative cepstral coefficients, Mel-frequency cepstrum coefficients with dynamic time warping algorithm, the AEE capability for recognition was optimized.

[1]  Aaron E. Rosenberg,et al.  Performance tradeoffs in dynamic time warping algorithms for isolated word recognition , 1980 .

[2]  Peter Jancovic,et al.  Automatic Detection and Recognition of Tonal Bird Sounds in Noisy Environments , 2011, EURASIP J. Adv. Signal Process..

[3]  Aaron E. Rosenberg,et al.  An improved endpoint detector for isolated word recognition , 1981 .

[4]  Jeih-Weih Hung,et al.  Robust entropy-based endpoint detection for speech recognition in noisy environments , 1998, ICSLP.

[5]  Hema A. Murthy,et al.  Robust syllable segmentation and its application to syllable-centric continuous speech recognition , 2010, 2010 National Conference On Communications (NCC).

[6]  Seppo Ilmari Fagerlund,et al.  Bird Species Recognition Using Support Vector Machines , 2007, EURASIP J. Adv. Signal Process..

[7]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[8]  Ye Tian,et al.  Nonspeech segment rejection based on prosodic information for robust speech recognition , 2002 .

[9]  Xufang Zhao,et al.  A new hybrid approach for automatic speech signal segmentation using silence signal detection, energy convex hull, and spectral variation , 2008, 2008 Canadian Conference on Electrical and Computer Engineering.

[10]  Aki Härmä Automatic identification of bird species based on sinusoidal modeling of syllables , 2003, ICASSP.

[11]  Andrew Taylor,et al.  Monitoring Frog Communities: An Application of Machine Learning , 1996, AAAI/IAAI, Vol. 2.

[12]  Chin-Chuan Han,et al.  Automatic recognition of animal vocalizations using averaged MFCC and linear discriminant analysis , 2006, Pattern Recognit. Lett..

[13]  Chenn-Jung Huang,et al.  Frog classification using machine learning techniques , 2009, Expert Syst. Appl..