On Distant Speech Recognition for Home Automation

In the framework of Ambient Assisted Living, home automation may be a solution for helping elderly people living alone at home. This study is part of the Sweet-Home project which aims at developing a new home automation system based on voice command to improve support and well-being of people in loss of autonomy. The goal of the study is vocal order recognition with a focus on two aspects: distance speech recognition and sentence spotting. Several ASR techniques were evaluated on a realistic corpus acquired in a 4-room flat equipped with microphones set in the ceiling. This distant speech French corpus was recorded with 21 speakers who acted scenarios of activities of daily living. Techniques acting at the decoding stage, such as our novel approach called Driven Decoding Algorithm (DDA), gave better speech recognition results than the baseline and other approaches. This solution which uses the two best SNR channels and a priori knowledge (voice commands and distress sentences) has demonstrated an increase in recognition rate without introducing false alarms. Generally speaking, a short overview allows then to outline the research challenges that speech technologies must take up for Ambient Assisted Living and Augmentative and Alternative Communication, and the current reseach avenues in this domain.

[1]  Brigitte Meillon,et al.  The sweet-home project: Audio technology in smart homes to improve well-being and reliance , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[2]  J. E. Rougui,et al.  Audio sound event identification for distress situations and context awareness , 2009, 2009 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[3]  Robyn Tamblyn,et al.  Pilot study of an interactive voice response system to improve medication refill compliance , 2008, BMC Medical Informatics Decis. Mak..

[4]  Roman Grundkiewicz,et al.  Automatic Extraction of Polish Language Errors from Text Edition History , 2013, TSD.

[5]  Andreas Wendemuth,et al.  Towards Robust Spontaneous Speech Recognition with Emotional Speech Adapted Acoustic Models , 2012 .

[6]  Marjorie Skubic,et al.  An acoustic fall detector system that uses sound height information to reduce the false alarm rate , 2008, 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[7]  Jean-Marc Valin,et al.  On Adjusting the Learning Rate in Frequency Domain Echo Cancellation With Double-Talk , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  David Philippou-Hübner,et al.  Vowels Formants Analysis Allows Straightforward Detection of High Arousal Acted and Spontaneous Emotions , 2011, INTERSPEECH.

[9]  Juan Carlos Augusto,et al.  Handbook of Ambient Intelligence and Smart Environments , 2009 .

[10]  John McDonough,et al.  Distant Speech Recognition , 2009 .

[11]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[12]  Georges Linarès,et al.  Phoneme Lattice Based A* Search Algorithm for Speech Recognition , 2002, TSD.

[13]  Constantine Stephanidis,et al.  Universal Access in Human-Computer Interaction , 2011 .

[14]  Michel Vacher,et al.  Development of Audio Sensing Technology for Ambient Assisted Living: Applications and Challenges , 2011, Int. J. E Health Medical Commun..

[15]  Alex Mihailidis,et al.  Development of an automated speech recognition interface for personal emergency response systems , 2009, Journal of NeuroEngineering and Rehabilitation.

[16]  K. Burk,et al.  Perceptual and acoustic correlates of aging in the speech of males. , 1974, Journal of communication disorders.

[17]  Lars Bäckman,et al.  Aging and memory: Cognitive and biological perspectives. , 2001 .

[18]  Steve Renals,et al.  Longitudinal study of ASR performance on ageing voices , 2008, INTERSPEECH.

[19]  Bart Vanrumste,et al.  Self-taught assistive vocal interfaces: an overview of the ALADIN project , 2013, INTERSPEECH.

[20]  Mari Zakrzewski,et al.  Probing a Proactive Home : Challenges in Researching and Designing Everyday Smart Environments , 2006 .

[21]  W. Keith Edwards,et al.  At Home with Ubiquitous Computing: Seven Challenges , 2001, UbiComp.

[22]  Albert Rilliard,et al.  The prosodic dimensions of emotion in speech: the relative weights of parameters , 2005, INTERSPEECH.

[23]  Brigitte Meillon,et al.  The Sweet-Home project: Audio processing and decision making in smart home to improve well-being and reliance , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[24]  Kiyohiro Shikano,et al.  Acoustic models of the elderly for large‐vocabulary continuous speech recognition , 2004 .

[25]  Georges Linarès,et al.  System Combination by Driven Decoding , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[26]  Michel Vacher,et al.  Speech-based interaction in an AAL-context , 2012 .

[27]  Michel Vacher,et al.  Experimental Evaluation of Speech Recognition Technologies for Voice-based Home Automation Control in a Smart Home , 2013, SLPAT.

[28]  Heidi Christensen,et al.  homeService: Voice-enabled assistive technology in the home using cloud-based automatic speech recognition , 2013, SLPAT.

[29]  Michel Vacher,et al.  Recognition of voice commands by multisource ASR and noise cancellation in a smart home environment , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[30]  J. Fozard,et al.  Changes in vision and hearing with aging. , 2001 .

[31]  Atta Badii,et al.  CompanionAble: integrated cognitive assistive and domotic companion robotic systems for ability and security , 2009 .

[32]  Michel Vacher,et al.  Speech recognition in a smart home: Some experiments for telemonitoring , 2009, 2009 Proceedings of the 5-th Conference on Speech Technology and Human-Computer Dialogue.

[33]  Georges Linarès,et al.  Reconnaissance de la parole guidée par des transcriptions approchées , 2006 .

[34]  Brigitte Meillon,et al.  Design and evaluation of a smart home voice interface for the elderly: acceptability and objection aspects , 2011, Personal and Ubiquitous Computing.

[35]  Brigitte Meillon,et al.  The Sweet-Home speech and multimodal corpus for home automation interaction , 2014, LREC.

[36]  Michel Vacher,et al.  Making Context Aware Decision from Uncertain Information in a Smart Home: A Markov Logic Network Approach , 2013, AmI.

[37]  Calyspso Gilstrap Turn on the Light. , 1928, Journal of the National Medical Association.

[38]  Jon Barker,et al.  The second ‘CHiME’ speech separation and recognition challenge: An overview of challenge systems and outcomes , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[39]  Kallirroi Georgila,et al.  Speech Input from Older Users in Smart Environments: Challenges and Perspectives , 2009, HCI.

[40]  S. Katz,et al.  A Measure of Primary Sociobiological Functions , 1976, International journal of health services : planning, administration, evaluation.

[41]  Kallirroi Georgila,et al.  Being Old Doesn’t Mean Acting Old: How Older Users Interact with Spoken Dialog Systems , 2009, TACC.

[42]  Constantine Stephanidis Intelligent and ubiquitous interaction environments , 2009 .

[43]  Georges Linarès,et al.  Generalized driven decoding for speech recognition system combination , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[44]  Denise C. Park,et al.  Handbook of the Psychology of Aging , 1979 .

[45]  Ning Ma,et al.  The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..

[46]  Junichi Yamagishi,et al.  Towards Cross-Lingual Emotion Transplantation , 2014, IberSPEECH.

[47]  Michel Vacher,et al.  Preliminary evaluation of speech/sound recognition for telemedicine application in a real environment , 2008, INTERSPEECH.

[48]  Tom J. Moir,et al.  From science fiction to science fact: A Smart-House interface using speech technology and a photo-realistic avatar , 2008, 2008 15th International Conference on Mechatronics and Machine Vision in Practice.

[49]  Michel Vacher,et al.  Distant Speech Recognition in a Smart Home: Comparison of Several Multisource ASRs in Realistic Conditions , 2011, INTERSPEECH.

[50]  Kathleen F. McCoy,et al.  Introduction to the Special Issue on AAC , 2009, TACC.

[51]  Ramón López-Cózar,et al.  Multimodal Dialogue for Ambient Intelligence and Smart Environments , 2010, Handbook of Ambient Intelligence and Smart Environments.

[52]  Peter Gregor,et al.  Introduction to the Special Issue on Aging and Information Technology , 2009, TACC.

[53]  Eric Campo,et al.  A review of smart homes - Present state and future challenges , 2008, Comput. Methods Programs Biomed..

[54]  Ilias Maglogiannis,et al.  Enabling human status awareness in assistive environments based on advanced sound and motion data classification , 2008, PETRA '08.

[55]  Marc Cavazza,et al.  How was your day?: a companion ECA , 2010, AAMAS.

[56]  Petros Maragos,et al.  The DIRHA simulated corpus , 2014, LREC.

[57]  Michel Vacher,et al.  Embedded Implementation of Distress Situation Identification through Sound Analysis , 2008 .

[58]  Michel Vacher,et al.  Speech recognition of aged voice in the AAL context: Detection of distress sentences , 2013, 2013 7th Conference on Speech Technology and Human - Computer Dialogue (SpeD).

[59]  Klaus-Peter Engelbrecht,et al.  Study of a Speech-based Smart Home System with Older Users , 2008 .

[60]  Alain Franco Conférence invitée: Nouveaux paradigmes et technologies pour la santé et l'autonomie (Invited Conference: New Paradigms and Technologies for Health and Autonomy) [in French] , 2012, ILADI@TALN.

[61]  Kaisa Väänänen,et al.  Evolution towards smart home environments: empirical evaluation of three user interfaces , 2004, Personal and Ubiquitous Computing.

[62]  P. B. Mueller,et al.  Acoustic and morphologic study of the senescent voice. , 1984, Ear, nose, & throat journal.

[63]  Isabel Trancoso,et al.  Impact of Age in ASR for the Elderly: Preliminary Experiments in European Portuguese , 2012, IberSPEECH.

[64]  N. Noury,et al.  Challenges in the processing of audio channels for Ambient Assisted Living , 2010, The 12th IEEE International Conference on e-Health Networking, Applications and Services.

[65]  Brian Roark,et al.  Speech and Language processing as assistive technologies , 2013, Comput. Speech Lang..

[66]  Li Deng,et al.  Large-vocabulary speech recognition under adverse acoustic environments , 2000, INTERSPEECH.

[67]  Martina Ziefle,et al.  Technology acceptability for medical assistance , 2010, 2010 4th International Conference on Pervasive Computing Technologies for Healthcare.

[68]  WANG Yu,et al.  An New Approach for Incremental Speaker Adaptation , 2000 .

[69]  Gérard Chollet,et al.  Hands-free speech-sound interactions at home , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[70]  Jen-Tzung Chien,et al.  A new eigenvoice approach to speaker adaptation , 2004, 2004 International Symposium on Chinese Spoken Language Processing.

[71]  C L Ludlow,et al.  Aging Effects on Motor Units in the Human Thyroarytenoid Muscle , 2000, The Laryngoscope.

[72]  Gregory D. Abowd,et al.  Ubicomp 2001: Ubiquitous Computing , 2001, Lecture Notes in Computer Science.

[73]  Sweeney Rj,et al.  Acoustic and morphologic study of the senescent voice. , 1984 .

[74]  Lorna Lines,et al.  Multiple voices, multiple choices: Older adults??? evaluation of speech output to support independent living , 2006 .

[75]  Michel Vacher,et al.  How affects can perturbe the automatic speech recognition of domotic interactions , 2013 .

[76]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[77]  Michel Vacher,et al.  CIRDO: Smart companion for helping elderly to live at home for longer ☆ , 2014 .