Distinctive Phonetic Feature (DPF) Extraction Based on MLNs and Inhibition/Enhancement Network

This paper describes a distinctive phonetic feature (DPF) extraction method for use in a phoneme recognition system; our method has a low computation cost. This method comprises three stages. The first stage uses two multilayer neural networks (MLNs): MLNLF-DPF, which maps continuous acoustic features, or local features (LFs), onto discrete DPF features, and MLNDyn, which constrains the DPF context at the phoneme boundaries. The second stage incorporates inhibition/enhancement (In/En) functionalities to discriminate whether the DPF dynamic patterns of trajectories are convex or concave, where convex patterns are enhanced and concave patterns are inhibited. The third stage decorrelates the DPF vectors using the Gram-Schmidt orthogonalization procedure before feeding them into a hidden Markov model (HMM)-based classifier. In an experiment on Japanese Newspaper Article Sentences (JNAS) utterances, the proposed feature extractor, which incorporates two MLNs and an In/En network, was found to provide a higher phoneme correct rate with fewer mixture components in the HMMs.

[1]  Ellen Eide Distinctive features for use in an automatic speech recognition system , 2001, INTERSPEECH.

[2]  Stephanie Seneff,et al.  Two-pass strategy for handling OOVs in a large vocabulary recognition task , 2005, INTERSPEECH.

[3]  Heiga Zen,et al.  Continuous Speech Recognition Based on General Factor Dependent Acoustic Models , 2005, IEICE Trans. Inf. Syst..

[4]  Hynek Hermansky,et al.  Hierarchical tandem feature extraction , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[6]  Tetsunori Kobayashi,et al.  ASJ continuous speech corpus for research , 1992 .

[7]  Gernot A. Fink,et al.  Combining acoustic and articulatory feature information for robust speech recognition , 2002, Speech Commun..

[8]  Tsuneo Nitta Feature extraction for speech recognition based on orthogonal acoustic-feature planes and LDA , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[9]  Simon King,et al.  Detection of phonological features in continuous speech using neural networks , 2000, Comput. Speech Lang..

[10]  Takashi Fukuda,et al.  Orthogonalized Distinctive Phonetic Feature Extraction for Noise-Robust Automatic Speech Recognition , 2004, IEICE Trans. Inf. Syst..

[11]  Shuji Taniguchi,et al.  Recurrent neural networks for phoneme recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12]  Steve Young,et al.  The general use of tying in phoneme-based HMM speech recognisers , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Takashi Fukuda,et al.  Distinctive phonetic feature extraction for robust speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..