Speaker Independent Phoneme Recognition Based on Fisher Weight Map

We have already proposed a new feature extraction method based on higher-order local auto-correlation and Fisher weight map (FWM) at Interspeech2006. This paper shows effectiveness of the proposed FWM in speaker dependent and speaker independent phoneme recognition. Widely used MFCC (Mel-frequency cepstrum coefficient) features lack temporal dynamics. To solve this problem, local auto-correlation features are computed and accumulated by weighting high scores on the discriminative areas. This score map is called Fisher weight map. From the speaker dependent phoneme recognition, the proposed FWM showed 79.5% recognition rate, by 5.0 points higher than the result by MFCC. Furhermore by combing FWM with MFCC and DeltaMFCC, the recognition rate improved to 88.3%. In the speaker independent phoneme recognition, it showed 84.2% recognition rate, by 11.0 points higher than the result by MFCC. By combining FWM with MFCC and DeltaMFCC, the reecognition rate improved to 89.0%.

[1]  Nobuyuki Otsu,et al.  Facial expression recognition using Fisher weight maps , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[2]  Tsuneo Nitta Feature extraction for speech recognition based on orthogonal acoustic-feature planes and LDA , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3]  Phil D. Green,et al.  Robust automatic speech recognition with missing and unreliable acoustic data , 2001, Speech Commun..

[4]  James R. Glass,et al.  Speech recognition with localized time-frequency pattern detectors , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[5]  Mats Blomberg,et al.  Effects of emphasizing transitional or stationary parts of the speech signal in a discrete utterance recognition system , 1982, ICASSP.

[6]  Tetsuya Takiguchi,et al.  Phoneme recognition based on fisher weight map to higher-order local auto-correlation , 2006, INTERSPEECH.

[7]  Tsuneo Nitta A novel feature-extraction for speech recognition based on multiple acoustic-feature planes , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).