Enabling Non-Speech Experts to Develop Usable Speech-User Interfaces

.................................................................................................. Acknowledgements ..................................................................................... Table of

[1]  R. Cole,et al.  THE OGI KIDS’ SPEECH CORPUS AND RECOGNIZERS , 2000 .

[2]  Indrani Medhi-Thies,et al.  A Hindi speech recognizer for an agricultural video search application , 2013, ACM DEV '13.

[3]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[4]  Desney S. Tan,et al.  CueFlik: interactive concept learning in image search , 2008, CHI.

[5]  E. Hutchins Cognition in the wild , 1995 .

[6]  Scott R. Klemmer,et al.  Authoring sensor-based interactions by demonstration with direct manipulation and pattern recognition , 2007, CHI.

[7]  Desney S. Tan,et al.  Interactive optimization for steering machine classification , 2010, CHI.

[8]  Matthew Kam,et al.  Improving literacy in rural India: cellphone games in an after-school program , 2009, 2009 International Conference on Information and Communication Technologies and Development (ICTD).

[9]  Elmar Nöth,et al.  Improving Children's Speech Recognition by HMM Interpolation with an Adults' Speech Recognizer , 2003, DAGM-Symposium.

[10]  James A. Landay,et al.  Gestalt: integrated support for implementation and analysis in machine learning , 2010, UIST.

[11]  John F. Canny,et al.  SPRING: speech and pronunciation improvement through games, for Hispanic children , 2010, ICTD.

[12]  Matthew Kam,et al.  Improving literacy in developing countries using speech recognition-supported games on mobile devices , 2012, CHI.

[13]  Diego Giuliani,et al.  Investigating recognition of children's speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[14]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[15]  Ellen Campana,et al.  Incremental understanding in human-computer dialogue and experimental evidence for advantages over nonincremental methods , 2007 .

[16]  Alexander H. Waibel,et al.  Multimodal error correction for speech user interfaces , 2001, TCHI.

[17]  Blaz Zupan,et al.  Orange: From Experimental Machine Learning to Interactive Data Mining , 2004, PKDD.

[18]  Daniel B. Horn,et al.  Patterns of entry and correction in large vocabulary continuous speech recognition systems , 1999, CHI '99.

[19]  Long Qin,et al.  Learning Out-of-Vocabulary Words in Automatic Speech Recognition , 2013 .

[20]  M. Pickering,et al.  Influence of Connectives on Language Comprehension: Eye tracking Evidence for Incremental Interpretation , 1997 .

[21]  Guus Schreiber,et al.  Knowledge Engineering and Management: The CommonKADS Methodology , 1999 .

[22]  Donald A. Norman,et al.  Things That Make Us Smart: Defending Human Attributes In The Age Of The Machine , 1993 .

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  Ronald Rosenfeld,et al.  Speech vs. touch-tone: Telephony interfaces for information access by low literate users , 2009, 2009 International Conference on Information and Communication Technologies and Development (ICTD).

[25]  Ronald Rosenfeld,et al.  Discriminative pronunciation learning for speech recognition for resource scarce languages , 2012, ACM DEV '12.

[26]  D. Perkins Person plus: A distributed view of thinking and learning , 1994 .

[27]  E H Shorthffe,et al.  Computer-based medical consultations mycin , 1976 .

[28]  Marelie H. Davel,et al.  Error analysis of a public domain pronunciation dictionary , 2007 .

[29]  L. Venkata Subramaniam,et al.  A Knowledge Acquisition Method for Improving Data Quality in Services Engagements , 2010, 2010 IEEE International Conference on Services Computing.

[30]  Judith A. Markowitz “The Art and Business of Speech Recognition: Creating the Noble Voice” , 2005, Int. J. Speech Technol..

[31]  Matthew Kam,et al.  Rethinking Speech Recognition on Mobile Devices , 2011 .

[32]  Agha Ali Raza,et al.  Job opportunities through entertainment: virally spread speech-based services for low-literate users , 2013, CHI.

[33]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[34]  Alexander I. Rudnicky,et al.  A unified design for human-machine voice interaction , 2001, CHI Extended Abstracts.

[35]  Dilek Z. Hakkani-Tür,et al.  Active learning: theory and applications to automatic speech recognition , 2005, IEEE Transactions on Speech and Audio Processing.

[36]  Sadaoki Furui,et al.  Error analysis using decision trees in spontaneous presentation speech recognition , 2001, IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU '01..

[37]  Phil Dutson,et al.  The Android Developer's Cookbook: Building Applications with the Android SDK , 2010 .

[38]  Daniel Povey,et al.  Speaking rate adaptation using continuous frame rate normalization , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[39]  Eric Fosler-Lussier,et al.  Effects of speaking rate and word frequency on pronunciations in convertional speech , 1999, Speech Commun..

[40]  Florian Metze,et al.  The speech recognition virtual kitchen , 2013, INTERSPEECH.

[41]  James R. Glass,et al.  Historical Development and Future Directions in Speech Recognition and Understanding , 2007 .

[42]  Sharon L. Oviatt,et al.  Designing speech and language interactions , 2014, CHI Extended Abstracts.

[43]  Matthew Kam,et al.  Designing e-learning games for rural children in India: a format for balancing learning with fun , 2008, DIS '08.

[44]  Bongshin Lee,et al.  Voice typing: a new speech interaction model for dictation on touchscreen devices , 2012, CHI.

[45]  Philip C. Woodland Speaker adaptation for continuous density HMMs: a review , 2001 .

[46]  Nara L. Newcomer,et al.  User Centered Design , 2014, Encyclopedia of Database Systems.

[47]  S. Wilson What Video Games Have to Teach Us about Learning and Literacy , 2006 .

[48]  Ioannis Hatzilygeroudis Using a hybrid rule-based approach in developing an intelligent tutoring system with knowledge acquisition and update capabilities , 2004, Expert Syst. Appl..

[49]  Chin-Hui Lee,et al.  MAP Estimation of Continuous Density HMM : Theory and Applications , 1992, HLT.

[50]  James H. Martin,et al.  An expert system‐based approach to prediction of year‐to‐year climatic variations in the North Atlantic region , 1999 .

[51]  Gabriel Skantze,et al.  Incremental Dialogue Processing in a Micro-Domain , 2009, EACL.

[52]  M. Bernardine Dias,et al.  The TechBridgeWorld initiative: broadening perspectives in computing technology education and research , 2005, CWIT '05.

[53]  M. Swain,et al.  Problems in Output and the Cognitive Processes They Generate: A Step Towards Second Language Learning , 1995, Applied Linguistics.

[54]  Paul Compton,et al.  Situated cognition and knowledge acquisition research , 2013, Int. J. Hum. Comput. Stud..

[55]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[56]  Ronald Rosenfeld,et al.  Keywords for a universal speech interface , 2002, CHI Extended Abstracts.

[57]  Matthew Kam,et al.  An exploratory study of unsupervised mobile learning in rural India , 2010, CHI.

[58]  Carol Van Ess-Dykema,et al.  Linguistically engineered tools for speech recognition error analysis , 1998, ICSLP.

[59]  Tim Paek,et al.  People watcher: a game for eliciting human-transcribed data for automated directory assistance , 2007, INTERSPEECH.

[60]  John M. Carroll,et al.  Mental Models in Human-Computer Interaction , 1988 .

[61]  D. Pisoni,et al.  Recognizing Spoken Words: The Neighborhood Activation Model , 1998, Ear and hearing.

[62]  Debbie Richards,et al.  Two decades of Ripple Down Rules research , 2009, The Knowledge Engineering Review.

[63]  John O. Willis,et al.  Peabody Picture Vocabulary Test–Third Edition , 2008 .

[64]  Desney S. Tan,et al.  EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers , 2009, CHI.

[65]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[66]  Matthew Kam,et al.  Localized iterative design for language learning in underdeveloped regions: the PACE framework , 2007, CHI.

[67]  I. Scott MacKenzie,et al.  Text Entry for Mobile Computing: Models and Methods,Theory and Practice , 2002, Hum. Comput. Interact..

[68]  Carlos Alonso González,et al.  Basic tasks for knowledge-based supervision in process control , 2001 .

[69]  A. D. Shveĭt︠s︡er,et al.  Introduction to sociolinguistics , 1986 .

[70]  David Schlangen,et al.  Assessing and Improving the Performance of Speech Recognition for Incremental Systems , 2009, NAACL.

[71]  Payal Arora,et al.  Karaoke for social and cultural change , 2006, J. Inf. Commun. Ethics Soc..

[72]  Koichi Shinoda Speaker adaptation techniques for automatic speech recognition , 2011 .

[73]  Ralf Kompe,et al.  A Combined MAP + MLLR Approach for Speaker Adaptation , 2002 .

[74]  GeeJames Paul What video games have to teach us about learning and literacy , 2003 .

[75]  Joyojeet Pal,et al.  Multiple mice for retention tasks in disadvantaged schools , 2007, CHI.

[76]  John Canny,et al.  Speech-enabled Systems for Language Learning , 2013 .

[77]  Shuang Xu,et al.  User Expectations from Dictation on Mobile Devices , 2007, HCI.

[78]  Tim Paek,et al.  Usability guided key-target resizing for soft keyboards , 2010, IUI '10.

[79]  Shumin Zhai,et al.  Shorthand writing on stylus keyboard , 2003, CHI '03.

[80]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[81]  Jack Mostow,et al.  Improving child literacy in Africa: Experiments with an automated reading tutor , 2009, 2009 International Conference on Information and Communication Technologies and Development (ICTD).

[82]  Gerald Penn,et al.  Automatic speech recognition for webcasts: how good is good enough and what to do when it isn't , 2006, ICMI '06.

[83]  Matthew Kam,et al.  Designing digital games for rural children: a study of traditional village games in India , 2009, CHI.

[84]  Sandra G. Hart,et al.  Nasa-Task Load Index (NASA-TLX); 20 Years Later , 2006 .

[85]  William Buxton,et al.  User learning and performance with marking menus , 1994, CHI 1994.

[86]  Michael Negnevitsky,et al.  Artificial Intelligence: A Guide to Intelligent Systems , 2001 .

[87]  María S. Carlo,et al.  The Critical Role of Vocabulary Development for English Language Learners , 2005 .

[88]  Anselm L. Strauss,et al.  Basics of qualitative research : techniques and procedures for developing grounded theory , 1998 .

[89]  C. Perfetti,et al.  The lexical quality hypothesis , 2002 .

[90]  Aitor J. Garrido,et al.  Basic theoretical results for expert systems. Application to the supervision of adaptation transients in planar robots , 2004, Artif. Intell..

[91]  K. Bot The Psycholinguistics of the Output Hypothesis , 1996 .

[92]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[93]  Sharon L. Oviatt,et al.  Advances in Robust Multimodal Interface Design , 2003, IEEE Computer Graphics and Applications.

[94]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[95]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[96]  Herbert H. Clark,et al.  Grounding in communication , 1991, Perspectives on socially shared cognition.

[97]  Stephen A. Brewster,et al.  Investigating the effectiveness of tactile feedback for mobile touchscreens , 2008, CHI.

[98]  Michael Kyle McCandless Word rejection for a literacy tutor , 1992 .

[99]  Samson W. Tu,et al.  Writing Rules for the Semantic Web Using SWRL and Jess , 2005 .

[100]  Nikolaos G. Bourbakis,et al.  A knowledge-based expert system for automatic visual VLSI reverse-engineering: VLSI layout version , 2002, IEEE Trans. Syst. Man Cybern. Part A.

[101]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[102]  Marelie H. Davel,et al.  Pronunciation dictionary development in resource-scarce environments , 2009, INTERSPEECH.

[103]  Maxine Eskénazi,et al.  Speaking to the Crowd: Looking at Past Achievements in Using Crowdsourcing for Speech and Predicting Future Challenges , 2011, INTERSPEECH.

[104]  Yajie Miao,et al.  Kaldi+PDNN: Building DNN-based ASR Systems with Kaldi and PDNN , 2014, ArXiv.

[105]  Gerald Penn,et al.  A Critical Reassessment of Evaluation Baselines for Speech Summarization , 2008, ACL.

[106]  Lori Lamel,et al.  Do speech recognizers prefer female speakers? , 2005, INTERSPEECH.

[107]  Tanja Schultz,et al.  SPICE: web-based tools for rapid language adaptation in speech processing systems , 2007, INTERSPEECH.

[108]  David Shpilberg,et al.  ExperTAX sm : an expert system for corporate tax planing , 1986 .

[109]  Paul Clancy,et al.  An Expert System for Legal Consultation , 1989, IAAI.

[110]  Kai Yu,et al.  Handbook of Natural Language Processing and Machine Translation , 2012 .

[111]  Hermann Ney,et al.  Improved backing-off for M-gram language modeling , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[112]  Alexander I. Rudnicky,et al.  Universalizing speech: notes from the USI project , 2001, INTERSPEECH.

[113]  John Kingston,et al.  Development of a KBS for personal financial planning guided by pragmatic KADS , 1995 .

[114]  Franz Kummert,et al.  Incremental speech recognition for multimodal interfaces , 1998, IECON '98. Proceedings of the 24th Annual Conference of the IEEE Industrial Electronics Society (Cat. No.98CH36200).

[115]  Hermann Ney,et al.  Joint-sequence models for grapheme-to-phoneme conversion , 2008, Speech Commun..

[116]  Karim Filali,et al.  Frontend post-processing and backend model enhancement on the Aurora 2.0/3.0 databases , 2002, INTERSPEECH.

[117]  Maxine Eskénazi,et al.  An overview of spoken language technology for education , 2009, Speech Commun..

[118]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[119]  Jason D. Williams,et al.  Stability and Accuracy in Incremental Speech Recognition , 2011, SIGDIAL Conference.

[120]  Douglas D. O'Shaughnessy,et al.  Robust gender-dependent acoustic-phonetic modelling in continuous speech recognition based on a new automatic male/female classification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[121]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[122]  Matt Jones,et al.  We need to talk: HCI and the delicate topic of spoken language interaction , 2013, CHI Extended Abstracts.

[123]  Maxine Eskénazi,et al.  Automated two-way entrainment to improve spoken dialog system performance , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[124]  Daniel Jurafsky,et al.  Which words are hard to recognize? Prosodic, lexical, and disfluency factors that increase speech recognition error rates , 2010, Speech Commun..

[125]  Austin Henderson,et al.  Interaction design: beyond human-computer interaction , 2002, UBIQ.

[126]  Dimitris N. Chorafas,et al.  Expert Systems in Banking: A Guide for Senior Managers , 1991 .

[127]  Niels Ole Bernsen,et al.  MENTAL MODELS IN HUMAN-COMPUTER INTERACTION , 2010 .

[128]  Jonathan Le Roux,et al.  Black box optimization for automatic speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[129]  Jesse Thomason,et al.  Differences in User Responses to a Wizard-of-Oz versus Automated System , 2013, HLT-NAACL.

[130]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[131]  Anne Sullivan,et al.  Designing natural speech interactions for the living room , 2013, CHI Extended Abstracts.

[132]  Takeo Igarashi,et al.  Eyepatch: prototyping camera-based interaction through examples , 2007, UIST '07.

[133]  Jerry Alan Fails,et al.  A design tool for camera-based interaction , 2003, CHI '03.

[134]  Shu-Hsien Liao,et al.  Expert system methodologies and applications - a decade review from 1995 to 2004 , 2005, Expert Syst. Appl..

[135]  Joyojeet Pal,et al.  Speech Recognition for Illiterate Access to Information and Technology , 2006, 2006 International Conference on Information and Communication Technologies and Development.

[136]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[137]  Julie C. Sedivy,et al.  Subject Terms: Linguistics Language Eyes & eyesight Cognition & reasoning , 1995 .

[138]  Julia Hirschberg,et al.  Prosodic and other cues to speech recognition failures , 2004, Speech Commun..

[139]  Joost van Doremalen,et al.  Speech Technology in CALL: The Essential Role of Adaptation , 2010 .

[140]  Susan L. Hura,et al.  Voice User Interfaces , 2022, Encyclopedia of Big Data.

[141]  Matthew Kam,et al.  Formalizing expert knowledge for developing accurate speech recognizers , 2013, INTERSPEECH.

[142]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[143]  Charles A. Perfetti,et al.  Reading Ability: Lexical Quality to Comprehension , 2007 .

[144]  Ronald A. Cole,et al.  Highly accurate children's speech recognition for interactive reading tutors using subword units , 2007, Speech Commun..

[145]  Jerome R. Bellegarda,et al.  Statistical language model adaptation: review and perspectives , 2004, Speech Commun..

[146]  G. Altmann,et al.  Incremental interpretation at verbs: restricting the domain of subsequent reference , 1999, Cognition.

[147]  Anoop K. Sinha,et al.  Suede: a Wizard of Oz prototyping tool for speech user interfaces , 2000, UIST '00.

[148]  Ronanki Srikanth,et al.  Automatic Pronunciation Scoring And Mispronunciation Detection Using CMUSphinx , 2012 .

[149]  Clare-Marie Karat,et al.  Conversational Speech Interfaces and Technologies , 2007 .

[150]  Richard M. Stern,et al.  On the effects of speech rate in large vocabulary speech recognition systems , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.