The self-taught vocal interface

Speech technology is firmly rooted in daily life, most notably in command-and-control (C&C) applications. C&C usability downgrades quickly, however, when used by people with non-standard speech. We pursue a fully adaptive vocal user interface (VUI) which can learn both vocabulary and grammar directly from interaction examples, achieving robustness against non-standard speech by building up models from scratch. This approach raises feasibility concerns on the amount of training material required to yield an acceptable recognition accuracy. In a previous work, we proposed a VUI based on non-negative matrix factorisation (NMF) to find recurrent acoustic and semantic patterns comprising spoken commands and device-specific actions, and showed its effectiveness on unimpaired speech. In this work, we evaluate the feasibility of a self-taught VUI on a new database called domotica-3, which contains dysarthric speech with typical commands in a home automation setting. Additionally, we compare our NMF-based system with a system based on Gaussian mixtures. The evaluation favours our NMF-based approach, yielding feasible recognition accuracies for people with dysarthric speech after a few learning examples. Finally, we propose the use of a multi-layered semantic frame structure and demonstrate its effectiveness in boosting overall performance.

[1]  Walter Daelemans,et al.  A Self-Learning Assistive Vocal Interface Based on Vocabulary Learning and Grammar Induction , 2012, INTERSPEECH.

[2]  Seungjin Choi,et al.  Semi-Supervised Nonnegative Matrix Factorization , 2010, IEEE Signal Processing Letters.

[3]  Frank Rudzicz,et al.  Acoustic transformations to improve the intelligibility of dysarthric speech , 2011 .

[4]  Hugo Van hamme,et al.  Unsupervised learning of time-frequency patches as a noise-robust representation of speech , 2009, Speech Commun..

[5]  Hugo Van hamme,et al.  HAC-models: a novel approach to continuous speech recognition , 2008, INTERSPEECH.

[6]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[7]  Roland Kuhn,et al.  Rapid speaker adaptation in eigenvoice space , 2000, IEEE Trans. Speech Audio Process..

[8]  Ramón Fernández Astudillo,et al.  The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones , 2014, INTERSPEECH.

[9]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[10]  Chalapathy Neti,et al.  Automatic speechreading of impaired speech , 2001, AVSP.

[11]  Wolfgang L. Zagler,et al.  Computers Helping People with Special Needs, 12th International Conference, ICCHP 2010, Vienna, Austria, July 14-16, 2010, Proceedings, Part II , 2010, ICCHP.

[12]  G.D.R. Zon,et al.  Using Voice to Control the Civil Flightdeck , 2006 .

[13]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[14]  Andrzej Cichocki,et al.  Nonnegative Matrix and Tensor Factorization T , 2007 .

[15]  Christian Bauckhage,et al.  Non-negative Matrix Factorization in Multimodality Data for Segmentation and Label Prediction , 2011 .

[16]  Alex Acero,et al.  Rapid development of spoken language understanding grammars , 2006, Speech Commun..

[17]  Minhwa Chung,et al.  Developing a Voice User Interface with Improved Usability for People with Dysarthria , 2012, ICCHP.

[18]  Phil D. Green,et al.  Automatic speech recognition with sparse training data for dysarthric speakers , 2003, INTERSPEECH.

[19]  Walter Daelemans,et al.  A Self Learning Vocal Interface for Speech-impaired Users , 2013, SLPAT.

[20]  Hugo Van hamme,et al.  Fast vocabulary acquisition in an NMF-based self-learning vocal user interface , 2014, Comput. Speech Lang..

[21]  Adrienne M. Acrey Speech recognition in individuals with dysarthria , 2000 .

[22]  Michel Vacher,et al.  Distant Speech Recognition in a Smart Home: Comparison of Several Multisource ASRs in Realistic Conditions , 2011, INTERSPEECH.

[23]  Felipe Trujillo-Romero,et al.  Evolutionary approach for integration of multiple pronunciation patterns for enhancement of dysarthric speech recognition , 2014, Expert Syst. Appl..

[24]  Joris Driesen,et al.  Discovering Words in Speech using Matrix Factorization (Het ontdekken van woorden in spraak met behulp van matrixfactorisatie) , 2012 .

[25]  Sheri Hunnicutt,et al.  An investigation of different degrees of dysarthric speech as input to speaker-adaptive and speaker-dependent recognition systems , 2001 .

[26]  Bart Vanrumste,et al.  Self-taught assistive vocal interfaces: an overview of the ALADIN project , 2013, INTERSPEECH.

[27]  C. Middag Automatic analysis of pathological speech , 2012 .

[28]  Derry Fitzgerald,et al.  SHIFTED NMF WITH GROUP SPARSITY FOR CLUSTERING NMF BASIS FUNCTIONS , 2012 .

[29]  Hugo Van hamme,et al.  Comparing and combining classifiers for self-taught vocal interfaces , 2013, SLPAT.

[30]  P. Green,et al.  Automatic speech recognition and training for severely dysarthric users of assistive technology: The STARDUST project , 2006, Clinical linguistics & phonetics.

[31]  Heidi Christensen,et al.  homeService: Voice-enabled assistive technology in the home using cloud-based automatic speech recognition , 2013, SLPAT.

[32]  Frank Rudzicz,et al.  Comparing speaker-dependent and speaker-adaptive acoustic models for recognizing dysarthric speech , 2007, Assets '07.

[33]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[34]  Heidi Christensen,et al.  A comparative study of adaptive, automatic recognition of disordered speech , 2012, INTERSPEECH.