SoundSense: scalable sound sensing for people-centric applications on mobile phones

Top end mobile phones include a number of specialized (e.g., accelerometer, compass, GPS) and general purpose sensors (e.g., microphone, camera) that enable new people-centric sensing applications. Perhaps the most ubiquitous and unexploited sensor on mobile phones is the microphone - a powerful sensor that is capable of making sophisticated inferences about human activity, location, and social events from sound. In this paper, we exploit this untapped sensor not in the context of human communications but as an enabler of new sensing applications. We propose SoundSense, a scalable framework for modeling sound events on mobile phones. SoundSense is implemented on the Apple iPhone and represents the first general purpose sound sensing system specifically designed to work on resource limited phones. The architecture and algorithms are designed for scalability and Soundsense uses a combination of supervised and unsupervised learning techniques to classify both general sound types (e.g., music, voice) and discover novel sound events specific to individual users. The system runs solely on the mobile phone with no back-end interactions. Through implementation and evaluation of two proof of concept people-centric sensing applications, we demostrate that SoundSense is capable of recognizing meaningful sound events that occur in users' everyday lives.

[1]  Zheng Fang,et al.  Comparison of different implementations of MFCC , 2001 .

[2]  Jeroen Breebaart,et al.  Features for audio and music classification , 2003, ISMIR.

[3]  张国亮,et al.  Comparison of Different Implementations of MFCC , 2001 .

[4]  Henry A. Kautz,et al.  Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields , 2007, Int. J. Robotics Res..

[5]  Vesa T. Peltonen,et al.  Computational auditory scene recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[7]  Stefanie Tellex,et al.  An Audio-Based Personal Memory Aid , 2004, UbiComp.

[8]  Emiliano Miluzzo,et al.  People-centric urban sensing , 2006, WICON '06.

[9]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[10]  Douglas A. Reynolds,et al.  An overview of automatic speaker recognition technology , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Alex Pentland,et al.  Auditory Context Awareness via Wearable Computing , 1998 .

[12]  George Tzanetakis,et al.  Musical genre classification of audio signals , 2002, IEEE Trans. Speech Audio Process..

[13]  Albrecht Schmidt,et al.  Advanced Interaction in Context , 1999, HUC.

[14]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[15]  James A. Landay,et al.  The Mobile Sensing Platform: An Embedded Activity Recognition System , 2008, IEEE Pervasive Computing.

[16]  William G. Griswold,et al.  Peopletones: a system for the detection and notification of buddy proximity on mobile phones , 2008, MobiSys '08.

[17]  C.E. Shannon,et al.  Communication in the Presence of Noise , 1949, Proceedings of the IRE.

[18]  Sumit Basu A linked-HMM model for robust voicing and speech detection , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[19]  Paul Lukowicz,et al.  Analysis of Chewing Sounds for Dietary Monitoring , 2005, UbiComp.

[20]  Mehryar Mohri,et al.  Voice signatures , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[21]  Ben P. Milner,et al.  Context awareness using environmental noise classification , 2003, INTERSPEECH.

[22]  Ishwar K. Sethi,et al.  Classification of general audio data for content-based retrieval , 2001, Pattern Recognit. Lett..

[23]  Leonidas J. Guibas,et al.  Mobiscopes for Human Spaces , 2007, IEEE Pervasive Computing.

[24]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[25]  Karl Ernst Osthaus Van de Velde , 1920 .

[26]  Krzysztof Z. Gajos,et al.  Opportunity Knocks: A System to Provide Cognitive Assistance with Transportation Services , 2004, UbiComp.

[27]  Mirco Musolesi,et al.  Sensing meets mobile social networks: the design, implementation and evaluation of the CenceMe application , 2008, SenSys '08.

[28]  Ling Ma,et al.  Acoustic environment as an indicator of social and physical context , 2005, Personal and Ubiquitous Computing.

[29]  C.-C. Jay Kuo,et al.  Audio-guided audiovisual data segmentation, indexing, and retrieval , 1998, Electronic Imaging.

[30]  Andrew T. Campbell,et al.  Cooperative Techniques Supporting Sensor-Based People-Centric Inferencing , 2009, Pervasive.

[31]  Ian H. Witten,et al.  Weka: Practical machine learning tools and techniques with Java implementations , 1999 .

[32]  S. Dixon ONSET DETECTION REVISITED , 2006 .

[33]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[34]  Ben P. Milner,et al.  Acoustic environment classification , 2006, TSLP.

[35]  M. Hansen,et al.  Participatory Sensing , 2019, Internet of Things.

[36]  Anind K. Dey,et al.  Lifelogging memory appliance for people with episodic memory impairment , 2008, UbiComp.

[37]  Victor Zue,et al.  Automatic transcription of general audio data: preliminary analyses , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[38]  Jonathan Foote,et al.  An overview of audio information retrieval , 1999, Multimedia Systems.

[39]  Ling Bao,et al.  Activity Recognition from User-Annotated Acceleration Data , 2004, Pervasive.

[40]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[41]  Alex Pentland,et al.  Sensing and modeling human networks , 2004 .