VoiceLabel: using speech to label mobile sensor data

Many mobile machine learning applications require collecting and labeling data, and a traditional GUI on a mobile device may not be an appropriate or viable method for this task. This paper presents an alternative approach to mobile labeling of sensor data called VoiceLabel. VoiceLabel consists of two components: (1) a speech-based data collection tool for mobile devices, and (2) a desktop tool for offline segmentation of recorded data and recognition of spoken labels. The desktop tool automatically analyzes the audio stream to find and recognize spoken labels, and then presents a multimodal interface for reviewing and correcting data labels using a combination of the audio stream, the system's analysis of that audio, and the corresponding mobile sensor data. A study with ten participants showed that VoiceLabel is a viable method for labeling mobile sensor data. VoiceLabel also illustrates several key features that inform the design of other data labeling tools.

[1]  H. Bourlard,et al.  Speaker-dependent Speech Recognition Based on Phone-like Units Models | Application to Voice Dialing Speaker-dependent Speech Recognition Based on Phone-like Units Models | Application to Voice Dialing , 1996 .

[2]  H. Nissenbaum Protecting Privacy in an Information Age: The Problem of Privacy in Public , 1998, The Ethics of Information Technologies.

[3]  Robert Szewczyk,et al.  System architecture directions for networked sensors , 2000, ASPLOS IX.

[4]  Alex Pentland,et al.  Smart headphones: enhancing auditory awareness through robust speech detection and source localization , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[5]  Emmanuel,et al.  Activity recognition in the home setting using simple and ubiquitous sensors , 2003 .

[6]  Mik Lamming,et al.  SPECx: Another Approach to Human Context and Activity Sensing Research, Using Tiny Peer-to-Peer Wireless Computers , 2003, UbiComp.

[7]  Su-Liang Chen,et al.  Wireless in loco Sensor Data Collection and Applications , 2004 .

[8]  Ling Bao,et al.  Activity Recognition from User-Annotated Acceleration Data , 2004, Pervasive.

[9]  Masataka Goto,et al.  Speech Interface Exploiting Intentionally-Controlled Nonverbal Speech Information , 2005 .

[10]  Blake Hannaford,et al.  A Hybrid Discriminative/Generative Approach for Modeling Human Activities , 2005, IJCAI.

[11]  Richard Wright,et al.  The Vocal Joystick: A Voice-Based Human-Computer Interface for Individuals with Motor Impairments , 2005, HLT.

[12]  Desney S. Tan,et al.  CueTIP: a mixed-initiative interface for correcting handwriting errors , 2006, UIST.

[13]  Gaetano Borriello,et al.  A Practical Approach to Recognizing Physical Activities , 2006, Pervasive.

[14]  Mike Y. Chen,et al.  Practical Metropolitan-Scale Positioning for GSM Phones , 2006, UbiComp.

[15]  Jennifer Healey,et al.  A Long-Term Evaluation of Sensing Modalities for Activity Recognition , 2007, UbiComp.

[16]  James Fogarty,et al.  iLearn on the iPhone: Real-Time Human Activity Classification on Commodity Mobile Phones , 2008 .

[17]  David W. McDonald,et al.  Activity sensing in the wild: a field trial of ubifit garden , 2008, CHI.