G-g-go! Juuump! Online Performance of a Multi-keyword Spotter in a Real-time Game

We report results for an online multi-keyword spotter in a game that contains overlapping speech, off-task side talk, and keyword forms that vary in completeness and duration. The spotter trained on a data set of 62 children, and expectations for online performance were established by 10-fold crossvalidation on that corpus. We compare the post hoc data to the recognizer’s performance online in a study in which 24 new children played with the real-time system. The online system showed a non-significant decline in accuracy which could be traced to trouble understanding the jump keyword and the predominance of younger children in the new cohort. However, children adjusted their behavior to compensate, and the overall performance and responsiveness of the online system resulted in engaging and enjoyable gameplay.

[1]  Jill Fain Lehman Robo fashion world: a multimodal corpus of multi-child human-computer interaction , 2014, UM3I '14.

[2]  Jürgen Schmidhuber,et al.  An Application of Recurrent Neural Networks to Discriminative Keyword Spotting , 2007, ICANN.

[3]  Jens Edlund,et al.  Human pause and resume behaviours for unobtrusive humanlike in-car spoken dialogue systems , 2014, DM@EACL.

[4]  Shrikanth S. Narayanan,et al.  Improving speech recognition for children using acoustic adaptation and pronunciation modeling , 2014, WOCCI.

[5]  Shrikanth S. Narayanan,et al.  A review of ASR technologies for children's speech , 2009, WOCCI.

[6]  Kajal T. Claypool,et al.  Latency and player actions in online games , 2006, CACM.

[7]  Jerome R. Bellegarda,et al.  Spoken Language Understanding for Natural Interaction: The Siri Experience , 2012, Natural Interaction with Robots, Knowbots and Smartphones, Putting Spoken Dialog Systems into Practice.

[8]  William Yang Wang,et al.  “Love ya, jerkface”: Using Sparse Log-Linear Models to Build Positive and Impolite Relationships with Teens , 2012, SIGDIAL Conference.

[9]  Fabio Brugnara,et al.  Acoustic variability and automatic recognition of children's speech , 2007, Speech Commun..

[10]  Samer Al Moubayed,et al.  Toward Better Understanding of Engagement in Multiparty Spoken Interaction with Children , 2015, ICMI.

[11]  Tara N. Sainath,et al.  Large vocabulary automatic speech recognition for children , 2015, INTERSPEECH.

[12]  Samer Al Moubayed,et al.  Mole Madness - A Multi-Child, Fast-Paced, Speech-Controlled Game , 2015, AAAI Spring Symposia.

[13]  Jill Fain Lehman,et al.  Multi-party Language Interaction in a Fast-Paced Game Using Multi-keyword Spotting , 2016, IVA.

[14]  Rita Singh,et al.  Keyword spotting in multi-player voice driven games for children , 2015, INTERSPEECH.

[15]  T. Carr,et al.  When dyads act in parallel, a sense of agency for the auditory consequences depends on the order of the actions , 2013, Consciousness and Cognition.

[16]  Björn W. Schuller,et al.  Tandem decoding of children's speech for keyword detection in a child-robot interaction scenario , 2011, TSLP.

[17]  James R. Glass,et al.  Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.