Speech and speaker recognition for home automation: Preliminary results

In voice controlled multi-room smart homes ASR and speaker identification systems face distance speech conditions which have a significant impact on performance. Regarding voice command recognition, this paper presents an approach which selects dynamically the best channel and adapts models to the environmental conditions. The method has been tested on data recorded with 11 elderly and visually impaired participants in a real smart home. The voice command recognition error rate was 3.2% in off-line condition and of 13.2% in online condition. For speaker identification, the performances were below very speaker dependant. However, we show a high correlation between performance and training size. The main difficulty was the too short utterance duration in comparison to state of the art studies. Moreover, speaker identification performance depends on the size of the adapting corpus and then users must record enough data before using the system.

[1]  John McDonough,et al.  Distant Speech Recognition , 2009 .

[2]  Carmen D Dirksen,et al.  Literature review on monitoring technologies and their outcomes in independently living elderly people , 2015, Disability and rehabilitation. Assistive technology.

[3]  Iván Pau,et al.  The Elderly’s Independent Living in Smart Homes: A Characterization of Activities and Sensing Infrastructure Survey to Facilitate Services Development , 2015, Sensors.

[4]  Lukás Burget,et al.  Comparison of keyword spotting approaches for informal continuous speech , 2005, INTERSPEECH.

[5]  James M. Keller,et al.  A smart home application to eldercare: current status and lessons learned. , 2009, Technology and health care : official journal of the European Society for Engineering and Medicine.

[6]  Douglas A. Reynolds,et al.  SHEEP, GOATS, LAMBS and WOLVES A Statistical Analysis of Speaker Performance in the NIST 1998 Speaker Recognition Evaluation , 1998 .

[7]  Quynh Lê,et al.  Smart Homes for Older People: Positive Aging in a Digital World , 2012, Future Internet.

[8]  Sergey Novoselov,et al.  STC Speaker Recognition System for the NIST i-Vector Challenge , 2014, Odyssey.

[9]  Michel Vacher,et al.  Sound detection and classification through transient models usingwavelet coefficient trees , 2004, 2004 12th European Signal Processing Conference.

[10]  Petros Maragos,et al.  The DIRHA simulated corpus , 2014, LREC.

[11]  Bart Vanrumste,et al.  Self-taught assistive vocal interfaces: an overview of the ALADIN project , 2013, INTERSPEECH.

[12]  Sylvain Meignier,et al.  Automatic named identification of speakers using diarization and ASR systems , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[14]  Michel Vacher,et al.  Development of Audio Sensing Technology for Ambient Assisted Living: Applications and Challenges , 2011, Int. J. E Health Medical Commun..

[15]  Paul Deléglise,et al.  Extracting true speaker identities from transcriptions , 2007, INTERSPEECH.

[16]  Murat Saraclar,et al.  Lattice Indexing for Spoken Term Detection , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Michel Vacher,et al.  Experimental Evaluation of Speech Recognition Technologies for Voice-based Home Automation Control in a Smart Home , 2013, SLPAT.

[19]  Heidi Christensen,et al.  homeService: Voice-enabled assistive technology in the home using cloud-based automatic speech recognition , 2013, SLPAT.

[20]  Nathalie Labonnote,et al.  Smart home technologies that support independent living: challenges and opportunities for the building industry – a systematic mapping study , 2017 .

[21]  Tom J. Moir,et al.  From science fiction to science fact: A Smart-House interface using speech technology and a photo-realistic avatar , 2008, 2008 15th International Conference on Mechatronics and Machine Vision in Practice.

[22]  Michel Vacher,et al.  Distant Speech Recognition in a Smart Home: Comparison of Several Multisource ASRs in Realistic Conditions , 2011, INTERSPEECH.

[23]  Haizhou Li,et al.  ALIZE 3.0 - open source toolkit for state-of-the-art speaker recognition , 2013, INTERSPEECH.

[24]  Gérard Chollet,et al.  Efficient Gaussian Mixture for Speech Recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[25]  Julie Mauclair,et al.  Speaker Diarization: About whom the Speaker is Talking ? , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[26]  Brigitte Meillon,et al.  Evaluation of a Context-Aware Voice Interface for Ambient Assisted Living , 2015, ACM Trans. Access. Comput..

[27]  Michel Vacher,et al.  SVM-Based Multimodal Classification of Activities of Daily Living in Health Smart Homes: Sensors, Algorithms, and First Experimental Results , 2010, IEEE Transactions on Information Technology in Biomedicine.

[28]  Francois Bremond,et al.  A Computer system to monitor older adults at home: Preliminary results , 2009 .

[29]  Brigitte Meillon,et al.  The sweet-home project: Audio technology in smart homes to improve well-being and reliance , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[30]  Brigitte Meillon,et al.  Design and evaluation of a smart home voice interface for the elderly: acceptability and objection aspects , 2011, Personal and Ubiquitous Computing.

[31]  Brigitte Meillon,et al.  The Sweet-Home speech and multimodal corpus for home automation interaction , 2014, LREC.

[32]  Alex Mihailidis,et al.  Development of an automated speech recognition interface for personal emergency response systems , 2009, Journal of NeuroEngineering and Rehabilitation.

[33]  Solange Rossato,et al.  Intra-speaker variability effects on Speaker Verification performance , 2010, Odyssey.

[34]  Michel Vacher,et al.  Making Context Aware Decision from Uncertain Information in a Smart Home: A Markov Logic Network Approach , 2013, AmI.

[35]  Andreas P. Schmidt,et al.  SOPRANO – An extensible , open AAL platform for elderly people based on semantical contracts 1 , 2008 .

[36]  Mark Liberman,et al.  Transcriber: Development and use of a tool for assisting speech corpora production , 2001, Speech Commun..

[37]  Eric Campo,et al.  A review of smart homes - Present state and future challenges , 2008, Comput. Methods Programs Biomed..

[38]  Jon Barker,et al.  The second ‘chime’ speech separation and recognition challenge: Datasets, tasks and baselines , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[39]  Michel Vacher,et al.  Sound Environment Analysis in Smart Home , 2012, AmI.

[40]  Michel Vacher,et al.  CIRDO: Smart companion for helping elderly to live at home for longer ☆ , 2014 .

[41]  Alexandra König,et al.  Validation of an automatic video monitoring system for the detection of instrumental activities of daily living in dementia patients. , 2015, Journal of Alzheimer's disease : JAD.

[42]  Kai Feng,et al.  The subspace Gaussian mixture model - A structured model for speech recognition , 2011, Comput. Speech Lang..

[43]  Atta Badii,et al.  CompanionAble: integrated cognitive assistive and domotic companion robotic systems for ability and security , 2009 .

[44]  Alex Mihailidis,et al.  A Survey on Ambient-Assisted Living Tools for Older Adults , 2013, IEEE Journal of Biomedical and Health Informatics.

[45]  Diane J. Cook,et al.  Behavior-Based Home Energy Prediction , 2012, 2012 Eighth International Conference on Intelligent Environments.

[46]  Liyanage C. De Silva,et al.  State of the art of smart homes , 2012, Eng. Appl. Artif. Intell..