Robust Phoneme Recognizer at Noise Corrupted Acoustic Environment

This paper proposes a robust automatic phoneme recognizer for Japanese language in noise corrupted acoustic environments. From the previous studies it is found that some hidden factors such as speaking style, gender effects, and noisy acoustic environments degrade the performance of automatic phoneme recognizers (APRs). In this study, an APR is designed in noise corrupted acoustic environments resolving the noise effect. The proposed system comprises three stages. At first stage, a multilayer neural network (MLN) that outputs Distinctive Phonetic Features (DPFs) from the input acoustic features is incorporated, and then the Karhunen-Loeve Transformation (KLT) and the Gram-Schmidt (GS) algorithms are used at second stage to extract reduced feature vector. Finally, the output phoneme strings are generated by inserting the reduced features into a hidden Markov model (HMM) based classifier. It is observed from the experiments in clean and noisy acoustic environments that the proposed method provides higher recognition accuracy at lower Signal-to-Noise Ratios (SNRs).

[1]  Hui Lin,et al.  OOV detection by joint word/phone lattice alignment , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[2]  Simon King,et al.  Detection of phonological features in continuous speech using neural networks , 2000, Comput. Speech Lang..

[3]  Gernot A. Fink,et al.  Combining acoustic and articulatory feature information for robust speech recognition , 2002, Speech Commun..

[4]  Ellen Eide Distinctive features for use in an automatic speech recognition system , 2001, INTERSPEECH.

[5]  Stephanie Seneff,et al.  Two-pass strategy for handling OOVs in a large vocabulary recognition task , 2005, INTERSPEECH.

[6]  Tsuneo Nitta Feature extraction for speech recognition based on orthogonal acoustic-feature planes and LDA , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7]  Yannis Stylianou,et al.  Speech-in-noise intelligibility improvement based on spectral shaping and dynamic range compression , 2012, INTERSPEECH.

[8]  Mark A. Clements,et al.  Phonemic recognition using a large hidden Markov model , 1992, IEEE Trans. Signal Process..

[9]  Tetsunori Kobayashi,et al.  ASJ continuous speech corpus for research , 1992 .

[10]  B. Merialdo Phonetic recognition using hidden Markov models and maximum mutual information training , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[11]  Takashi Fukuda,et al.  Orthogonalized Distinctive Phonetic Feature Extraction for Noise-Robust Automatic Speech Recognition , 2004, IEICE Trans. Inf. Syst..

[12]  Anubhuti Khare,et al.  Noise Reduction of Speech Signal using Wavelet Transform with Modified Universal Threshold , 2011 .