Speech Recognition for Smart Homes Ian McLoughlin

When Christopher Sholes created the QWERTY keyboard layout in the 1860s (often assumed to be for slowing down fast typists), few would have imagined that his invention would become the dominant input device of the 20th century. In the early years of the 21st century (the so called 'speed and information' century), its use remains dominant, despite many, arguably better, input devices having been invented. Surely it is time to consider alternatives, in particular the most natural method of human communications – spoken language. Spoken language is not only natural, but in many cases is faster than typed, or mousedriven input, and is accessible at times and in locations where keyboard, mouse and monitor (KMM) may not be convenient to use. In particular, in a world with growing penetration of embedded computers, the so-called 'smart home' may well see the first massmarket deployment of vocal interaction (VI) systems. What is necessary in order to make VI a reality within the smart home? In fact much of the underlying technology already exists – many home appliances, electrical devices, infotainment systems, sensors and so on are sufficiently intelligent to be networked. Wireless home networks are fast, and very common. Speech synthesis technology can generate natural sounding speech. Microphone and loudspeaker technology is wellestablished. Modern computers are highly capable, relatively inexpensive, and – as embedded systems – have already penetrated almost all parts of a modern home. However the bottleneck in the realisation of smart home systems appears to have been the automatic speech recognition (ASR) and natural language understanding aspects. In this chapter, we establish the case for automatic speech recognition (ASR) as part of VI within the home. We then overview appropriate ASR technology to present an analysis of the environment and operational conditions within the home related to ASR, in particular the argument of restricting vocabulary size to improve recognition accuracy. Finally, the discussion concludes with details on modifications to the widely used Sphinx ASR system for smart home deployment on embedded computers. We will demonstrate that such deployments are sensible, possible, and in fact will be coming to homes soon.

[1]  Raj Reddy,et al.  Automatic Speech Recognition: The Development of the SPHINX System , 2013 .

[2]  Michael F. McTear,et al.  Spoken Dialogue Technology: Toward the Conversational User Interface , 2011 .

[3]  Ian McLoughlin,et al.  Speech recognition for smart homes , 2008 .

[4]  I.V. McLoughlin,et al.  Speech recognition engine adaptions for smart home dialogues , 2007, 2007 6th International Conference on Information, Communications & Signal Processing.

[5]  Mikio Nakano,et al.  A Robot That Can Engage in Both Task-Oriented and Non-Task-Oriented Dialogues , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[6]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  Wolfgang Minker,et al.  Speech and Human-Machine Dialog , 2006 .

[8]  Richard C. Dorf,et al.  Circuits, Signals, and Speech and Image Processing , 2006 .

[9]  Jianfeng Chen,et al.  Investigations into the relationship between measurable speech quality and speech recognition rate for telephony speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Georges Linarès,et al.  Reducing computational and memory cost for cellular phone embedded speech recognition system , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Siddharth Verma,et al.  On design and implementation of an embedded automatic speech recognition system , 2004, 17th International Conference on VLSI Design. Proceedings..

[12]  Yasunari Obuchi,et al.  Compact and robust speech recognition for embedded use on microprocessors , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[13]  Philip Moore,et al.  Networked smart home appliances - enabling real ubiquitous culture , 2002, Proceedings 3rd IEEE International Workshop on System-on-Chip for Real-Time Applications.

[14]  James R. Glass,et al.  Speechbuilder: facilitating spoken dialogue system development , 2001, INTERSPEECH.

[15]  Sadaoki Furui,et al.  Toward flexible speech recognition-recent progress at Tokyo Institute of Technology , 2001, Canadian Conference on Electrical and Computer Engineering 2001. Conference Proceedings (Cat. No.01TH8555).

[16]  Jun Xu,et al.  Towards dependable home networking: an experience report , 2000, Proceeding International Conference on Dependable Systems and Networks. DSN 2000.

[17]  A. Dutta-Roy,et al.  Networks for homes , 1999 .

[18]  Yasunari Obuchi,et al.  Development of robust speech recognition middleware on microprocessor , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[19]  Jeremy H. Wright,et al.  How may I help you? , 1997, Speech Commun..

[20]  Mosur Ravishankar,et al.  Efficient Algorithms for Speech Recognition. , 1996 .

[21]  Andrew S. Tanenbaum,et al.  Computer networks (3rd ed.) , 1996 .

[22]  C. Kunz,et al.  Large-vocabulary speech recognition in specialized domains , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[23]  Karl D. Kryter,et al.  The Handbook of Hearing and the Effects of Noise: Physiology, Psychology, and Public Health , 1994 .

[24]  C. A. Kamm,et al.  Speech recognition issues for directory assistance applications , 1994, Proceedings of 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications.

[25]  Lawrence R. Rabiner,et al.  Applications of voice processing to telecommunications , 1994, Proc. IEEE.

[26]  Hsiao-Wuen Hon,et al.  An overview of the SPHINX speech recognition system , 1990, IEEE Trans. Acoust. Speech Signal Process..

[27]  Chin-Hui Lee,et al.  Acoustic modeling for large vocabulary speech recognition , 1990 .

[28]  D. B. Paul,et al.  The Lincoln robust continuous speech recognizer , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[29]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[30]  Victor Zue,et al.  The MIT SUMMIT Speech Recognition System: A Progress Report , 1989, HLT.

[31]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[32]  Aaron E. Rosenberg,et al.  Speaker independent recognition of isolated words using clustering techniques , 1979, ICASSP.

[33]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[34]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[35]  F. Itakura,et al.  Minimum prediction residual principle applied to speech recognition , 1975 .

[36]  N. G. Zagoruyko,et al.  Automatic recognition of 200 words , 1970 .

[37]  T. B. Martin,et al.  SPEECH RECOGNITION BY FEATURE-ABSTRACTION TECHNIQUES. , 1964 .

[38]  P. Denes,et al.  The design and operation of the mechanical speech recognizer at University College London , 1959 .

[39]  K. Davis,et al.  Automatic Recognition of Spoken Digits , 1952 .

[40]  G. A. Miller,et al.  The intelligibility of speech as a function of the context of the test materials. , 1951, Journal of experimental psychology.

[41]  Ian McLoughlin,et al.  Applied Speech and Audio Processing: Basic audio processing , 2009 .

[42]  Zheng-Hua Tan,et al.  Automatic speech recognition on mobile devices and over communication networks , 2008 .