Privacy-aware environmental sound classification for indoor human activity recognition

This paper presents a comparative study on different feature extraction and machine learning techniques for indoor environmental sound classification. Compared to outdoor environmental sound classification systems, indoor systems need to pay special attention to power consumption and privacy. We consider feature calculation complexity, classification accuracy and privacy as evaluation metrics. To ensure privacy, we strip voice bands from sound input to make human conversations unrecognizable. With 5 classes of 2500 indoor audio events as input, our experimental results show that using SVM model with LPCC feature, 78% classification accuracy can be reached. Furthermore, the performance is improved to more than 85% by combining several simple features and dropping unreliable predictions, which only slightly increase the complexity.

[1]  Fariba Sadri,et al.  Ambient intelligence: A survey , 2011, CSUR.

[2]  Xavier Serra,et al.  Freesound technical demo , 2013, ACM Multimedia.

[3]  Athanasios V. Vasilakos,et al.  A Survey on Ambient Intelligence in Healthcare , 2013, Proceedings of the IEEE.

[4]  Karol J. Piczak Environmental sound classification with convolutional neural networks , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).

[5]  A. Rees,et al.  Evidence for a sound movement area in the human cerebral cortex , 1996, Nature.

[6]  Bing Dong,et al.  Sensor-based occupancy behavioral pattern recognition for energy and comfort management in intelligent buildings , 2009 .

[7]  Renate Sitte,et al.  Comparison of techniques for environmental sound recognition , 2003, Pattern Recognit. Lett..

[8]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[9]  B. Kedem,et al.  Spectral analysis and discrimination by zero-crossings , 1986, Proceedings of the IEEE.

[10]  Tuomas Virtanen,et al.  Context-dependent sound event detection , 2013, EURASIP Journal on Audio, Speech, and Music Processing.

[11]  Sandrine Pavoine,et al.  Rapid Acoustic Survey for Biodiversity Appraisal , 2008, PloS one.

[12]  Israel Gannot,et al.  A Method for Automatic Fall Detection of Elderly People Using Floor Vibrations and Sound—Proof of Concept on Human Mimicking Doll Falls , 2009, IEEE Transactions on Biomedical Engineering.

[13]  Tuomas Virtanen,et al.  Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features , 2017, DCASE.

[14]  Andrzej Czyzewski,et al.  Detection, classification and localization of acoustic events in the presence of background noise for acoustic surveillance of hazardous situations , 2015, Multimedia Tools and Applications.

[15]  Martin T. Hagan,et al.  Neural network design , 1995 .

[16]  Mohammed Arif,et al.  Impact of indoor environmental quality on occupant well-being and comfort: A review of the literature , 2016 .

[17]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[18]  Christian Breiteneder,et al.  Features for Content-Based Audio Retrieval , 2010, Adv. Comput..

[19]  Rhys Goldstein,et al.  Real-time occupancy detection using decision trees with multiple sensor types , 2011, SpringSim.

[20]  Ioannis Papaefstathiou,et al.  Data-Driven Background Subtraction Algorithm for In-Camera Acceleration in Thermal Imagery , 2016, IEEE Transactions on Circuits and Systems for Video Technology.

[21]  Haizhou Li,et al.  Sound Event Recognition With Probabilistic Distance SVMs , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Lie Lu,et al.  A flexible framework for key audio effects detection and auditory context inference , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  Tuomas Virtanen,et al.  TUT database for acoustic scene classification and sound event detection , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[24]  Monica Dragoicea,et al.  A Service Oriented Simulation Architecture for Intelligent Building Management , 2013, IESS.

[25]  Kang Ryoung Park,et al.  Human Detection Based on the Generation of a Background Image by Using a Far-Infrared Light Camera , 2015, Sensors.

[26]  Weiqiang Dong On Bias , Variance , 0 / 1-Loss , and the Curse of Dimensionality RK April 13 , 2014 .

[27]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  R. Weisberg A-N-D , 2011 .

[29]  Justin Salamon,et al.  A Dataset and Taxonomy for Urban Sound Research , 2014, ACM Multimedia.