Characterizing the Effect of Audio Degradation on Privacy Perception And Inference Performance in Audio-Based Human Activity Recognition

Audio has been increasingly adopted as a sensing modality in a variety of human-centered mobile applications and in smart assistants in the home. Although acoustic features can capture complex semantic information about human activities and context, continuous audio recording often poses significant privacy concerns. An intuitive way to reduce privacy concerns is to degrade audio quality such that speech and other relevant acoustic markers become unintelligible, but this often comes at the cost of activity recognition performance. In this paper, we employ a mixed-methods approach to characterize this balance. We first conduct an online survey with 266 participants to capture their perception of privacy qualitatively and quantitatively with degraded audio. Given our findings that privacy concerns can be significantly reduced at high levels of audio degradation, we then investigate how intentional degradation of audio frames can affect the recognition results of the target classes while maintaining effective privacy mitigation. Our results indicate that degradation of audio frames can leave minimal effects for audio recognition using frame-level features. Furthermore, degradation of audio frames can hurt the performance to some extend for audio recognition using segment-level features, though the usage of such features may still yield superior recognition performance. Given the different requirements on privacy mitigation and recognition performance for different sensing purposes, such trade-offs need to be balanced in actual implementations.

[1]  Panayiotis G. Georgiou,et al.  Still together?: the role of acoustic features in predicting marital outcome , 2015, INTERSPEECH.

[2]  Anil Kumar Vuppala,et al.  Sound Privacy: A Conversational Speech Corpus for Quantifying the Experience of Privacy , 2019, INTERSPEECH.

[3]  Vesa T. Peltonen,et al.  Audio-based context recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Sacha Krstulović,et al.  Audio Event Recognition in the Smart Home , 2018 .

[5]  Gerhard Tröster,et al.  AmbientSense: A real-time ambient sound recognition system for smartphones , 2013, 2013 IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOM Workshops).

[6]  S. Rane,et al.  Privacy Preserving Techniques for Speech Processing , 2010 .

[7]  Anurag Kumar,et al.  Knowledge Transfer from Weakly Labeled Audio Using Convolutional Neural Network for Sound Events and Scenes , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Mark S. Granovetter The Strength of Weak Ties , 1973, American Journal of Sociology.

[9]  Aren Jansen,et al.  Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Reinhold Haeb-Umbach,et al.  Privacy-Preserving Variational Information Feature Extraction for Domestic Activity Monitoring versus Speaker Identification , 2019, INTERSPEECH.

[11]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[12]  Karol J. Piczak ESC: Dataset for Environmental Sound Classification , 2015, ACM Multimedia.

[13]  Norihiro Hagita,et al.  Privacy protected life-context-aware alert by simplified sound spectrogram from microphone sensor , 2011, CASEMANS '11.

[14]  Dawei Liang,et al.  Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[15]  Jilong Kuang,et al.  A Generative Model for Speech Segmentation and Obfuscation for Remote Health Monitoring , 2019, 2019 IEEE 16th International Conference on Wearable and Implantable Body Sensor Networks (BSN).

[16]  Mariella Dimiccoli,et al.  Mitigating Bystander Privacy Concerns in Egocentric Activity Recognition with Deep Learning and Intentional Image Degradation , 2018, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[17]  John Adcock,et al.  Audio privacy: reducing speech intelligibility while preserving environmental sounds , 2008, ACM Multimedia.

[18]  Nicholas D. Lane,et al.  DeepEar: robust smartphone audio sensing in unconstrained acoustic environments using deep learning , 2015, UbiComp.

[19]  Eric C. Larson,et al.  Accurate and privacy preserving cough sensing using a low-cost microphone , 2011, UbiComp '11.

[20]  Koji Yatani,et al.  BodyScope: a wearable acoustic sensor for activity recognition , 2012, UbiComp.

[21]  Salil S. Kanhere,et al.  A survey on privacy in mobile participatory sensing applications , 2011, J. Syst. Softw..

[22]  Xavier Serra,et al.  Freesound technical demo , 2013, ACM Multimedia.

[23]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[24]  Jeff A. Bilmes,et al.  Conversation detection and speaker segmentation in privacy-sensitive situated speech data , 2007, INTERSPEECH.

[25]  Muhammad Huzaifah,et al.  Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks , 2017, ArXiv.

[26]  Vincent Becker,et al.  GestEar: combining audio and motion sensing for gesture recognition on smartwatches , 2019, UbiComp.

[27]  Predrag V. Klasnja,et al.  Exploring Privacy Concerns about Personal Sensing , 2009, Pervasive.

[28]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[29]  Xiangyu Liu,et al.  Your Voice Assistant is Mine: How to Abuse Speakers to Steal Information and Control Your Phone , 2014, SPSM@CCS.

[30]  Gierad Laput,et al.  Ubicoustics: Plug-and-Play Acoustic Activity Recognition , 2018, UIST.

[31]  Ming Zeng,et al.  Sound Shredding: Privacy Preserved Audio Sensing , 2015, HotMobile.

[32]  Gregory D. Abowd,et al.  Inferring Meal Eating Activities in Real World Settings from Ambient Sounds: A Feasibility Study , 2015, IUI.

[33]  Jilong Kuang,et al.  A method for preserving privacy during audio recordings by filtering speech , 2017, 2017 IEEE Life Sciences Conference (LSC).

[34]  Todd M. Gureckis,et al.  CUNY Academic , 2016 .

[35]  Mani B. Srivastava,et al.  Privacy risks emerging from the adoption of innocuous wearable sensors in the mobile environment , 2011, CHI.

[36]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[37]  Ning Liu,et al.  Bathroom Activity Monitoring Based on Sound , 2005, Pervasive.

[38]  Wei Pan,et al.  SoundSense: scalable sound sensing for people-centric applications on mobile phones , 2009, MobiSys '09.