A comparison of broad phonetic and acoustic units for noise robust segment-based phonetic recognition

In this paper, we compare speech recognition performance using broad phoneticallyand acoustically-motivated units as a pre-processor in designing a novel noise robust landmark detection and segmentation algorithm. We introduce a cluster evaluation method to measure acoustic unit cluster quality. On the noisy TIMIT task, we find that the acoustic and phonetic segmentation approaches offer significant improvements over two baseline methods used in the SUMMIT segment-based speech recognizer, a sinusoidal model method and a spectral change approach. In addition, we find that the acoustic method has much faster computation time in stationary noises, while the phonetic approach is faster in non-stationary noise conditions.

[1]  James R. Glass,et al.  Heterogeneous acoustic measurements for phonetic classification 1 , 1997, EUROSPEECH.

[2]  藤村 靖,et al.  Gunnar Fant: Acoustic Theory of Speech Production : with Calculations based on X-Ray Studies of Russian Articulations, Mouton & Co, 1960, 's-Gravenhage $ 15 , 1962 .

[3]  Tara N. Sainath,et al.  A Sinusoidal Model Approach to Acoustic Landmark Detection and Segmentation for Robust Segment-Based Speech Recognition , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[5]  Mari Ostendorf,et al.  From HMM's to segment models: a unified view of stochastic modeling for speech recognition , 1996, IEEE Trans. Speech Audio Process..

[6]  James R. Glass A probabilistic framework for segment-based speech recognition , 2003, Comput. Speech Lang..

[7]  James R. Glass Finding acoustic regularities in speech: applications to phonetic recognition , 1988 .

[8]  James R. Glass,et al.  HETEROGENEOUS ACOUSTIC MEASUREMENTS FOR PHONETIC CLASSIFICATION , 1997 .

[9]  Gunnar Fant,et al.  Acoustic Theory Of Speech Production , 1960 .

[10]  Tara N. Sainath,et al.  Broad phonetic class recognition in a Hidden Markov model framework using extended Baum-Welch transformations , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[11]  Frank K. Soong,et al.  A segment model based approach to speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.