An Environmental Audio{Based Context Recognition System Using Smartphones

Environmental sound/audio is a rich source of information that can be used to infer a person's context in daily life. Almost every activity produces some sound patterns, e.g., speaking, walking, washing, or typing on computer. Most locations have usually a specific sound pattern too, e.g., restaurants, offices or streets. This thesis addresses the design and development of an application for real-time detection and recognition of user activities using audio signals on mobile phones. The audio recognition application increases the capability, intelligence and feature of the mobile phones and, thus, increases the convenience of the users. For example, a smartphone can automatically go into a silent mode while entering a meeting or provide information customized to the location of the user. However, mobile phones have limited power and capabilities in terms of CPU, memory and energy supply. As a result, it is important that the design of audio recognition application meets the limited resources of the mobile phones. In this thesis we compare performance of different audio classifiers (k-NN, SVM and GMM) and audio feature extraction techniques based on their recognition accuracy and computational speed in order to select the optimal ones. We evaluate the performance of the audio event recognition techniques on a set of 6 daily life sound classes (coffee machine brewing, water tape (hand washing), walking, elevator, door opening/closing, and silence ). Test results show that the k-NN classifier (when used with mel-frequency cepstral coefficients (MFCCs), spectral entropy (SE) and spectral centroid (SC) audio features) outperforms other audio classifiers in terms of recognition accuracy and execution time. The audio features are selected based on simulation results and proved to be optimal features. An online audio event recognition application is then implemented as an Android app (on mobile phones) using the k-NN classifier and the selected optimal audio features. The application continuously classifies audio events (user activities) by analyzing environmental sounds sampled from smartphone's microphone. It provides a user with real-time display of the recognized context (activity). The impact of other parameters such as analysis window and overlapping size on the performance of audio recognition is also analyzed. The test result shows that varying the parameters does not have significant impact on the performance of the audio recognition technique. Moreover, we also compared online audio recognition results of the same classifier set (i.e.,k-NN) with that of the off-line classification results.

[1]  Tatsuo Nakajima,et al.  Feature Selection and Activity Recognition from Wearable Sensors , 2006, UCS.

[2]  Kazuo Hattori,et al.  A new edited k-nearest neighbor rule in the pattern classification problem , 2000, Pattern Recognit..

[3]  Kenneth Meijer,et al.  Activity identification using body-mounted sensors—a review of classification techniques , 2009, Physiological measurement.

[4]  Sergios Theodoridis,et al.  Introduction to Pattern Recognition: A Matlab Approach , 2010 .

[5]  Christopher J. C. Burges,et al.  A Tutorial on Support Vector Machines for Pattern Recognition , 1998, Data Mining and Knowledge Discovery.

[6]  Ben P. Milner,et al.  Acoustic environment classification , 2006, TSLP.

[7]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[8]  A. Doucet,et al.  Smoothing algorithms for state–space models , 2010 .

[9]  Hynek Hermansky,et al.  Spectral entropy based feature for robust ASR , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Joon-Hyuk Chang,et al.  Enhancing support vector machine-based speech/music classification using conditional maximum a posteriori criterion , 2012, IET Signal Process..

[11]  Waltenegus Dargie,et al.  Adaptive Audio-Based Context Recognition , 2009, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[12]  Voula C. Georgopoulos,et al.  Wigner Distribution Representation and Analysis of Audio Signals: An Illustrated Tutorial Review , 1999 .

[13]  Anthony Rowe,et al.  Location and Activity Recognition Using eWatch: A Wearable Sensor Platform , 2006, Ambient Intelligence in Everyday.

[14]  Chloé Clavel,et al.  Events Detection for an Audio-Based Surveillance System , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[15]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[16]  Mahesh Panchal,et al.  A Review on Support Vector Machine for Data Classification , 2012 .

[17]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .

[18]  David Howard,et al.  A Comparison of Feature Extraction Methods for the Classification of Dynamic Activities From Accelerometer Data , 2009, IEEE Transactions on Biomedical Engineering.

[19]  Hanseok Ko,et al.  Acoustic and visual signal based context awareness system for mobile application , 2011, 2011 IEEE International Conference on Consumer Electronics (ICCE).

[20]  M N Nyan,et al.  Distinguishing fall activities from normal activities by angular rate characteristics and high-speed camera characterization. , 2006, Medical engineering & physics.

[21]  Jie Huang Spatial auditory processing for a hearing robot , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[22]  Etienne Cornu,et al.  Low-power implementation of an HMM-based sound environment classification algorithm for hearing aid application , 2007, 2007 15th European Signal Processing Conference.

[23]  Ishwar K. Sethi,et al.  Classification of general audio data for content-based retrieval , 2001, Pattern Recognit. Lett..

[24]  Georgios Tziritas,et al.  A speech/music discriminator based on RMS and zero-crossings , 2005, IEEE Transactions on Multimedia.

[25]  Vesa T. Peltonen,et al.  Computational auditory scene recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[26]  C.-C. Jay Kuo,et al.  Audio content analysis for online audiovisual data segmentation and classification , 2001, IEEE Trans. Speech Audio Process..

[27]  Ling Bao,et al.  Activity Recognition from User-Annotated Acceleration Data , 2004, Pervasive.

[28]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[29]  Ling Ma,et al.  Acoustic environment as an indicator of social and physical context , 2005, Personal and Ubiquitous Computing.

[30]  Wei Pan,et al.  SoundSense: scalable sound sensing for people-centric applications on mobile phones , 2009, MobiSys '09.

[31]  Miin-Shen Yang,et al.  A robust EM clustering algorithm for Gaussian mixture models , 2012, Pattern Recognit..

[32]  Preeti Rao,et al.  FEATURE EXTRACTION FOR SPEECH RECOGNITON , 2003 .

[33]  P. Dhanalakshmi,et al.  Speech/Music Classification using wavelet based Feature Extraction Techniques , 2014, J. Comput. Sci..

[34]  Dante Augusto Couto Barone,et al.  A speaker identification system using a model of artificial neural networks for an elevator application , 2001, Inf. Sci..

[35]  Robert Sabourin,et al.  A dynamic model selection strategy for support vector machine classifiers , 2012, Appl. Soft Comput..

[36]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[37]  William A. Sethares,et al.  Beat tracking of musical performances using low-level audio features , 2005, IEEE Transactions on Speech and Audio Processing.

[38]  Ning Liu,et al.  Bathroom Activity Monitoring Based on Sound , 2005, Pervasive.

[39]  Waltenegus Dargie,et al.  Recognition of Complex Settings by Aggregating Atomic Scenes , 2008, IEEE Intelligent Systems.

[40]  Fernando Pereira,et al.  MPEG-7 the generic multimedia content description standard, part 1 - Multimedia, IEEE , 2001 .

[41]  Gérard Bailly,et al.  Proceedings of the 2005 joint conference on Smart objects and ambient intelligence - innovative context-aware services: usages and technologies, sOc-EUSAI '05, Grenoble, France, October 12-14, 2005 , 2005, sOc-EUSAI.

[42]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[43]  Julian Fiérrez,et al.  Forensic identification reporting using automatic speaker recognition systems , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[44]  Bernt Schiele,et al.  Analyzing features for activity recognition , 2005, sOc-EUSAI '05.

[45]  Tobias Andersson Audio Classification and Content Description , 2004 .

[46]  Adam Wolisz,et al.  Proceedings of the 7th international conference on Mobile systems, applications, and services , 2009, Mobisys 2009.

[47]  Mykola Pechenizkiy,et al.  The Impact of Feature Extraction on the Performance of a Classifier: kNN, Naïve Bayes and C4.5 , 2005, Canadian Conference on AI.

[48]  Johannes Fürnkranz,et al.  Round Robin Classification , 2002, J. Mach. Learn. Res..

[49]  Douglas A. Reynolds,et al.  Gaussian Mixture Models , 2018, Encyclopedia of Biometrics.

[50]  N. Noury,et al.  Challenges in the processing of audio channels for Ambient Assisted Living , 2010, The 12th IEEE International Conference on e-Health Networking, Applications and Services.

[51]  Hanan Samet,et al.  K-Nearest Neighbor Finding Using MaxNearestDist , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[52]  Ling Xing,et al.  A Multi-semantic Audio Classication Method Based on Tensor Space ⋆ , 2012 .

[53]  C.-C. Jay Kuo,et al.  Where am I? Scene Recognition for Mobile Robots using Audio Features , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[54]  Jhing-Fa Wang,et al.  Robust Environmental Sound Recognition for Home Automation , 2008, IEEE Transactions on Automation Science and Engineering.

[55]  Peter Kabal,et al.  Speech/music discrimination for multimedia applications , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[56]  S. SELVA NIDHYANANTHAN,et al.  Language and Text-Independent Speaker Identification System Using GMM , 2013 .

[57]  Tae Hong Park Introduction to digital signal processing - Computer Musically Speaking , 2009 .