TILES audio recorder: an unobtrusive wearable solution to track audio activity

Most existing speech activity trackers used in human subject studies are bulky, record raw audio content which invades participant privacy, have complicated hardware and non-customizable software, and are too expensive for large-scale deployment. The present effort seeks to overcome these challenges by proposing the TILES Audio Recorder (TAR) - an unobtrusive and scalable solution to track audio activity using an affordable miniature mobile device with an open-source app. For this recorder, we make use of Jelly Pro Mobile, a pocket-sized Android smartphone, and employ two open-source toolkits: openSMILE and Tarsos-DSP. Tarsos-DSP provides a Voice Activity Detection capability that triggers openSMILE to extract and save audio features only when the subject is speaking. Experiments show that performing feature extraction only during speech segments greatly increases battery life, enabling the subject to wear the recorder up to 10 hours at time. Furthermore, recording experiments with ground-truth clean speech show minimal distortion of the recorded features, as measured by root mean-square error and cosine distance. The TAR app further provides subjects with a simple user interface that allows them to both pause feature extraction at any time and also easily upload data to a remote server.

[1]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .

[2]  Nobutaka Ito,et al.  The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND): A database of multichannel environmental noise recordings , 2013 .

[3]  Alex Pentland,et al.  Social fMRI: Investigating and shaping social mechanisms in the real world , 2011, Pervasive Mob. Comput..

[4]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[5]  Mark VanDam Acoustic characteristics of the clothes used for a wearable recording device. , 2014, The Journal of the Acoustical Society of America.

[6]  Kent L. Norman,et al.  Development of an instrument measuring user satisfaction of the human-computer interface , 1988, CHI '88.

[7]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[8]  A. Eriks-Brophy,et al.  Language ENvironment analysis (LENA) system investigation of day long recordings in children: A literature review. , 2017, Journal of communication disorders.

[9]  J. Pennebaker,et al.  The Electronically Activated Recorder (EAR): A device for sampling naturalistic daily activities and conversations , 2001, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[10]  Shyamal Patel,et al.  A review of wearable sensors and systems with application in rehabilitation , 2012, Journal of NeuroEngineering and Rehabilitation.

[11]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[12]  Marc Leman,et al.  TarsosDSP, a Real-Time Audio Processing Framework in Java , 2014, Semantic Audio.

[13]  T Klingeberg,et al.  Mobile wearable device for long term monitoring of vital signs , 2012, Comput. Methods Programs Biomed..

[14]  Tanzeem Choudhury,et al.  The Sociometer: A Wearable Device for Understanding Human Networks , 2002 .

[15]  James J S Norton,et al.  Epidermal mechano-acoustic sensing electronics for cardiovascular diagnostics and human-machine interfaces , 2016, Science Advances.

[16]  Emil Jovanov,et al.  Issues in wearable computing for medical monitoring applications: a case study of a wearable ECG monitoring device , 2000, Digest of Papers. Fourth International Symposium on Wearable Computers.

[17]  Zhigang Deng,et al.  Analysis of emotion recognition using facial expressions, speech and multimodal information , 2004, ICMI '04.