Consonant enhancement for articulation disorders based on non-negative matrix factorization

We present consonant enhancement on a voice for a person with articulation disorders resulting from athetoid cerebral palsy. The movement of such speakers is limited by their athetoid symptoms, and their consonants are often unstable or unclear, which makes it difficult for them to communicate. Speech recognition for articulation disorders has been studied; however, its recognition rate is still lower than that of physically unimpaired persons. In this paper, an exemplar-based spectral conversion using Non-negative Matrix Factorization (NMF) is applied to consonant enhancement of a voice with articulation disorders. The source speaker's spectrum is easily converted into a well-ordered speaker's spectrum. Its effectiveness is examined for voice quality and clarity of consonants for a person with articulation disorders.

[1]  Mikkel N. Schmidt,et al.  Single-channel speech separation using sparse non-negative matrix factorization , 2006, INTERSPEECH.

[2]  Ak SharmaCol,et al.  Campbell's Operative Orthopaedics , 2004 .

[3]  Tetsuya Takiguchi,et al.  PCA-based feature extraction for fluctuation in speaking style of articulation disorders , 2007, INTERSPEECH.

[4]  Tomoki Toda,et al.  Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech , 2006, INTERSPEECH.

[5]  Tuomas Virtanen,et al.  Noise robust exemplar-based connected digit recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Hideki Kawahara,et al.  Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds , 1999, Speech Commun..

[7]  Tetsuya Takiguchi,et al.  Multimodal speech recognition of a person with articulation disorders using AAM and MAF , 2010, 2010 IEEE International Workshop on Multimedia Signal Processing.

[8]  P. Smaragdis,et al.  Non-negative matrix factorization for polyphonic music transcription , 2003, 2003 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (IEEE Cat. No.03TH8684).

[9]  Eric Moulines,et al.  Statistical methods for voice quality transformation , 1995, EUROSPEECH.

[10]  Md. Khayrul Bashar,et al.  Unsupervised Texture Segmentation via Wavelet-based Locally Orderless Images (WLOIs) and SOM , 2003, Computer Graphics and Imaging.

[11]  Ying Wu,et al.  Capturing human hand motion in image sequences , 2002, Workshop on Motion and Video Computing, 2002. Proceedings..