Context-Aware Voice-Based Interaction in Smart Home - VocADom@A4H Corpus Collection and Empirical Assessment of Its Usefulness

Smart homes aim at enhancing the quality of life of people at home by the use of home automation systems and Ambient Intelligence. Most of these smart homes provide enhanced interaction by relying on context-aware systems learned on data. Whereas voice-based interaction is the current emerging trend, most available corpora are either concerned only with home automation sensors or only with audio technology, which limits the development of context-aware voice-based systems. This paper presents the VocADom@A4H corpus, which is a dataset composed of users' interactions recorded in a fully equipped Smart Home. About 12 hours of multichannel distant speech signal synchronized with logs of an openHAB home automation system were collected from 11 participants who performed activities of daily living with the presence of real-life noises, such as other persons speaking, use of vacuum cleaner, TV, etc. This corpus can serve as a valuable material for studies in pervasive intelligence, such as human tracking, human activity recognition, context aware interaction, and robust distant speech processing in the home. Experiments performed on multichannel speech and home automation sensors data for robust voice activity detection and multiresident localization show the potential of the corpus to support the development of context-aware smart home systems.

[1]  Karsten Berns,et al.  A survey of human location estimation in a home environment , 2014, The 23rd IEEE International Symposium on Robot and Human Interactive Communication.

[2]  Emmanuel Munguia Tapia,et al.  Toward Scalable Activity Recognition for Sensor Networks , 2006, LoCA.

[3]  Eric Campo,et al.  A review of smart homes - Present state and future challenges , 2008, Comput. Methods Programs Biomed..

[4]  Michel Vacher,et al.  Towards a French Smart-Home Voice Command Corpus: Design and NLU Experiments , 2018, TSD.

[5]  Frédéric Aman,et al.  Reconnaissance automatique de la parole de personnes âgées pour les services d'assistance à domicile. (Automatic speech recognition for ageing voices in the context of assisted living) , 2014 .

[6]  Diane J. Cook,et al.  Data Mining for Hierarchical Model Creation , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[7]  Konrad Schindler,et al.  Online Multi-Target Tracking Using Recurrent Neural Networks , 2016, AAAI.

[8]  Emmanuel Vincent,et al.  A French Corpus for Distant-Microphone Speech Processing in Real Homes , 2016, INTERSPEECH.

[9]  Toshiaki Miyazaki,et al.  Multiple Human Tracking Using Binary Infrared Sensors , 2015, Sensors.

[10]  I. Cohen,et al.  AR-GARCH in Presence of Noise: Parameter Estimation and Its Application to Voice Activity Detection , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  François Portet,et al.  Learning Natural Language Understanding Systems from Unaligned Labels for Voice Command in Smart Homes , 2019, 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops).

[12]  Diane J Cook,et al.  Assessing the Quality of Activities in a Smart Environment , 2009, Methods of Information in Medicine.

[13]  Francesco Piazza,et al.  A distributed system for recognizing home automation commands and distress calls in the Italian language , 2013, INTERSPEECH.

[14]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[15]  Ananya Misra Speech/Nonspeech Segmentation in Web Videos , 2012, INTERSPEECH.

[16]  Maurizio Omologo,et al.  The DIRHA-ENGLISH corpus and related tasks for distant-speech recognition in domestic environments , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[17]  Brigitte Meillon,et al.  Evaluation of a Context-Aware Voice Interface for Ambient Assisted Living , 2015, ACM Trans. Access. Comput..

[18]  Björn W. Schuller,et al.  Recent developments in openSMILE, the munich open-source multimedia feature extractor , 2013, ACM Multimedia.

[19]  Claudia Jiménez-Guarín,et al.  The ContextAct@A4H Real-Life Dataset of Daily-Living Activities - Activity Recognition Using Model Checking , 2017, CONTEXT.

[20]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[21]  Emmanuel Vincent,et al.  Keyword Based Speaker Localization: Localizing a Target Speaker in a Multi-speaker Environment , 2018, INTERSPEECH.

[22]  Kent Larson,et al.  Using a Live-In Laboratory for Ubiquitous Computing Research , 2006, Pervasive.

[23]  James H. Aylor,et al.  Computer for the 21st Century , 1999, Computer.

[24]  Björn W. Schuller,et al.  Real-life voice activity detection with LSTM Recurrent Neural Networks and an application to Hollywood movies , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[25]  Shinya Takahashi,et al.  Dialogue Experiment for Elderly People in Home Health Care System , 2003, TSD.

[26]  James L. Crowley,et al.  A Dataset of Routine Daily Activities in an Instrumented Home , 2017, UCAmI.

[27]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[28]  Marian Verhelst,et al.  The SINS Database for Detection of Daily Activities in a Home Environment Using an Acoustic Sensor Network , 2017, DCASE.

[29]  Michel Vacher,et al.  The VocADom Project: Speech Interaction for Well-being and Reliance Improvement , 2018, MobileHCI 2018.

[30]  Mohammed Feham,et al.  Multioccupant Activity Recognition in Pervasive Smart Home Environments , 2015, ACM Comput. Surv..

[31]  G. Englebienne,et al.  Transferring Knowledge of Activity Recognition across Sensor Networks , 2010, Pervasive.

[32]  Jon Barker,et al.  The CHiME Challenges: Robust Speech Recognition in Everyday Environments , 2017, New Era for Robust Speech Recognition, Exploiting Deep Learning.

[33]  Diane J. Cook,et al.  Tracking Systems for Multiple Smart Home Residents , 2013, Human Behavior Recognition Technologies.

[34]  Brigitte Meillon,et al.  The Sweet-Home speech and multimodal corpus for home automation interaction , 2014, LREC.

[35]  Michel Vacher,et al.  Location of an inhabitant for domotic assistance through fusion of audio and non-visual data , 2011, 2011 5th International Conference on Pervasive Computing Technologies for Healthcare (PervasiveHealth) and Workshops.

[36]  Özlem Durmaz Incel,et al.  ARAS human activity datasets in multiple homes with multiple residents , 2013, 2013 7th International Conference on Pervasive Computing Technologies for Healthcare and Workshops.

[37]  Maria Klara Wolters,et al.  Corpus Analysis of Spoken Smart-Home Interactions with Older Users , 2008, LREC.

[38]  Spyridon Matsoukas,et al.  Developing a Speech Activity Detection System for the DARPA RATS Program , 2012, INTERSPEECH.

[39]  Gwenn Englebienne,et al.  Accurate activity recognition in a home setting , 2008, UbiComp.

[40]  Chungyong Lee,et al.  Robust voice activity detection algorithm for estimating noise spectrum , 2000 .

[41]  Francois Bremond,et al.  A Computer system to monitor older adults at home: Preliminary results , 2009 .

[42]  Michel Vacher,et al.  A Multimodal Corpus Recorded in a Health Smart Home , 2010, LREC 2010.