PCA-based feature extraction for fluctuation in speaking style of articulation disorders

We investigated the speech recognition of a person with articulation disorders resulting from athetoid cerebral palsy. Recently, the accuracy of speaker-independent speech recognition has been remarkably improved by the use of stochastic modeling of speech. However, the use of those acoustic models causes degradation of speech recognition for a person with different speech styles (e.g., articulation disorders). In this paper, we discuss our efforts to build an acoustic model for a person with articulation disorders. The articulation of the first speech tends to become unstable due to strain on muscles and that causes degradation of speech recognition. Therefore, we propose a robust feature extraction method based on PCA (Principal Component Analysis) instead of MFCC. Its effectiveness is confirmed by word recognition experiments. Index Terms: articulation disorders, PCA, feature extraction

[1]  Tetsuya Takiguchi,et al.  Robust Feature Extraction using Kernel PCA , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[2]  Md. Khayrul Bashar,et al.  Unsupervised Texture Segmentation via Wavelet-based Locally Orderless Images (WLOIs) and SOM , 2003, Computer Graphics and Imaging.

[3]  Vassilios Digalakis,et al.  Speaker adaptation using combined transformation and Bayesian methods , 1996, IEEE Trans. Speech Audio Process..

[4]  Vassilios Digalakis,et al.  Speaker adaptation using combined transformation and Bayesian methods , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[5]  S. Canale,et al.  Campbell's operative orthopaedics , 1987 .

[6]  Lin-Shan Lee,et al.  Improved MFCC feature extraction by PCA-optimized filter-bank for speech recognition , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[7]  Diego Giuliani,et al.  Investigating recognition of children's speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  Tomoki Toda,et al.  Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech , 2006, INTERSPEECH.

[9]  Yasuo Horiuchi,et al.  Estimating Syntactic Structure from Prosody in Japanese Speech , 2003 .

[10]  Ying Wu,et al.  Capturing human hand motion in image sequences , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..

[11]  Steve Young,et al.  The HTK book , 1995 .