论文信息 - The self-taught vocal interface - 字舞流文

The self-taught vocal interface

Speech technology is firmly rooted in daily life, most notably in command-and-control (C&C) applications. C&C usability downgrades quickly, however, when used by people with non-standard speech. We pursue a fully adaptive vocal user interface (VUI) which can learn both vocabulary and grammar directly from interaction examples, achieving robustness against non-standard speech by building up models from scratch. This approach raises feasibility concerns on the amount of training material required to yield an acceptable recognition accuracy. In a previous work, we proposed a VUI based on non-negative matrix factorisation (NMF) to find recurrent acoustic and semantic patterns comprising spoken commands and device-specific actions, and showed its effectiveness on unimpaired speech. In this work, we evaluate the feasibility of a self-taught VUI on a new database called domotica-3, which contains dysarthric speech with typical commands in a home automation setting. Additionally, we compare our NMF-based system with a system based on Gaussian mixtures. The evaluation favours our NMF-based approach, yielding feasible recognition accuracies for people with dysarthric speech after a few learning examples. Finally, we propose the use of a multi-layered semantic frame structure and demonstrate its effectiveness in boosting overall performance.

Hugo Van hamme | Bart Ons | Jort F. Gemmeke | H. V. hamme | J. Gemmeke | B. Ons

[1] Walter Daelemans,et al. A Self-Learning Assistive Vocal Interface Based on Vocabulary Learning and Grammar Induction , 2012, INTERSPEECH.

[2] Seungjin Choi,et al. Semi-Supervised Nonnegative Matrix Factorization , 2010, IEEE Signal Processing Letters.

[3] Frank Rudzicz,et al. Acoustic transformations to improve the intelligibility of dysarthric speech , 2011 .

[4] Hugo Van hamme,et al. Unsupervised learning of time-frequency patches as a noise-robust representation of speech , 2009, Speech Commun..

[5] Hugo Van hamme,et al. HAC-models: a novel approach to continuous speech recognition , 2008, INTERSPEECH.

[6] Stan Davis,et al. Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[7] Roland Kuhn,et al. Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[8] Ramón Fernández Astudillo,et al. The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones , 2014, INTERSPEECH.

[9] W. Cleveland. Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[10] Chalapathy Neti,et al. Automatic speechreading of impaired speech , 2001, AVSP.

[11] Wolfgang L. Zagler,et al. Computers Helping People with Special Needs, 12th International Conference, ICCHP 2010, Vienna, Austria, July 14-16, 2010, Proceedings, Part II , 2010, ICCHP.

[12] G.D.R. Zon,et al. Using Voice to Control the Civil Flightdeck , 2006 .

[13] H. Sebastian Seung,et al. Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[14] Andrzej Cichocki,et al. Nonnegative Matrix and Tensor Factorization T , 2007 .

[15] Christian Bauckhage,et al. Non-negative Matrix Factorization in Multimodality Data for Segmentation and Label Prediction , 2011 .

[16] Alex Acero,et al. Rapid development of spoken language understanding grammars , 2006, Speech Commun..

[17] Minhwa Chung,et al. Developing a Voice User Interface with Improved Usability for People with Dysarthria , 2012, ICCHP.

[18] Phil D. Green,et al. Automatic speech recognition with sparse training data for dysarthric speakers , 2003, INTERSPEECH.

[19] Walter Daelemans,et al. A Self Learning Vocal Interface for Speech-impaired Users , 2013, SLPAT.

[20] Hugo Van hamme,et al. Fast vocabulary acquisition in an NMF-based self-learning vocal user interface , 2014, Comput. Speech Lang..

[21] Adrienne M. Acrey. Speech recognition in individuals with dysarthria , 2000 .

[22] Michel Vacher,et al. Distant Speech Recognition in a Smart Home: Comparison of Several Multisource ASRs in Realistic Conditions , 2011, INTERSPEECH.

[23] Felipe Trujillo-Romero,et al. Evolutionary approach for integration of multiple pronunciation patterns for enhancement of dysarthric speech recognition , 2014, Expert Syst. Appl..

[24] Joris Driesen,et al. Discovering Words in Speech using Matrix Factorization (Het ontdekken van woorden in spraak met behulp van matrixfactorisatie) , 2012 .

[25] Sheri Hunnicutt,et al. An investigation of different degrees of dysarthric speech as input to speaker-adaptive and speaker-dependent recognition systems , 2001 .

[26] Bart Vanrumste,et al. Self-taught assistive vocal interfaces: an overview of the ALADIN project , 2013, INTERSPEECH.

[27] C. Middag. Automatic analysis of pathological speech , 2012 .

[28] Derry Fitzgerald,et al. SHIFTED NMF WITH GROUP SPARSITY FOR CLUSTERING NMF BASIS FUNCTIONS , 2012 .

[29] Hugo Van hamme,et al. Comparing and combining classifiers for self-taught vocal interfaces , 2013, SLPAT.

[30] P. Green,et al. Automatic speech recognition and training for severely dysarthric users of assistive technology: The STARDUST project , 2006, Clinical linguistics & phonetics.

[31] Heidi Christensen,et al. homeService: Voice-enabled assistive technology in the home using cloud-based automatic speech recognition , 2013, SLPAT.

[32] Frank Rudzicz,et al. Comparing speaker-dependent and speaker-adaptive acoustic models for recognizing dysarthric speech , 2007, Assets '07.

[33] W. Cleveland,et al. Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[34] Heidi Christensen,et al. A comparative study of adaptive, automatic recognition of disordered speech , 2012, INTERSPEECH.