Temporal Dynamics of Workplace Acoustic Scenes: Egocentric Analysis and Prediction

Identification of the acoustic environment from an audio recording, also known as acoustic scene classification, is an active area of research. In this paper, we study dynamically-changing background acoustic scenes from the egocentric perspective of an individual in a workplace. In a novel data collection setup, wearable sensors were deployed on individuals to collect audio signals within a built environment, while Bluetooth-based hubs continuously tracked the individual's location which represents the acoustic scene at a certain time. The data of this paper come from 170 hospital workers gathered continuously during work shifts for a 10 week period. In the first part of our study, we investigate temporal patterns in the egocentric sequence of acoustic scenes encountered by an employee, and the association of those patterns with factors such as job-role and daily routine of the individual. Motivated by evidence of multifaceted effects of ambient sounds on human psychology, we also analyze the association of the temporal dynamics of the perceived acoustic scenes with particular behavioral traits of the individual. Experiments reveal rich temporal patterns in the acoustic scenes experienced by the individuals during their work shifts, and a strong association of those patterns with various constructs related to job-roles and behavior of the employees. In the second part of our study, we employ deep learning models to predict the temporal sequence of acoustic scenes from the egocentric audio signal. We propose a two-stage framework where a recurrent neural network is trained on top of the latent acoustic representations learned by a segment-level neural network. The experimental results show the efficacy of the proposed system in predicting sequence of acoustic scenes, highlighting the existence of underlying temporal patterns in the acoustic scenes experienced in workplace.

[1]  Tuomas Virtanen,et al.  A multi-device dataset for urban acoustic scene classification , 2018, DCASE.

[2]  Sanjeev Khudanpur,et al.  X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  S P Banbury,et al.  Office noise and employee concentration: Identifying causes of disruption and potential improvements , 2005, Ergonomics.

[4]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[5]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Shrikanth Narayanan,et al.  Speaker Agnostic Foreground Speech Detection from Audio Recordings in Workplace Settings from Wearable Recorders , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  VirtanenTuomas,et al.  Detection and Classification of Acoustic Scenes and Events , 2018 .

[8]  Harald Binder,et al.  Noise Annoyance Is Associated with Depression and Anxiety in the General Population- The Contribution of Aircraft Noise , 2016, PloS one.

[9]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[10]  Shrikanth Narayanan,et al.  Robust Speaker Recognition Using Unsupervised Adversarial Invariance , 2019, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  A. Muzet Environmental noise, sleep and health. , 2007, Sleep medicine reviews.

[12]  Aren Jansen,et al.  Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[13]  Shrikanth Narayanan,et al.  Multimodal Human and Environmental Sensing for Longitudinal Behavioral Studies in Naturalistic Settings: Framework for Sensor Selection, Deployment, and Management , 2019, Journal of medical Internet research.

[14]  Evy Öhrström,et al.  Effects of noise on mental performance with regard to subjective noise sensitivity , 1992, International archives of occupational and environmental health.

[15]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[16]  Sanjeev Khudanpur,et al.  A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[17]  Stephen Stansfeld,et al.  The Effect of Transportation Noise on Health and Cognitive Development:A Review of Recent Evidence - eScholarship , 2007 .

[18]  Panayiotis G. Georgiou,et al.  Behavioral Signal Processing: Deriving Human Behavioral Informatics From Speech and Language , 2013, Proceedings of the IEEE.

[19]  Shrikanth S. Narayanan,et al.  TILES audio recorder: an unobtrusive wearable solution to track audio activity , 2018, WearSys@MobiSys.

[20]  Cheuk Ming Mak,et al.  The effect of sound on office productivity , 2012 .

[21]  Shrikanth Narayanan,et al.  Lessons Learned: Recommendations For Implementing a Longitudinal Study Using Wearable and Environmental Sensors in a Health Care Organization , 2019, JMIR mHealth and uHealth.

[22]  Alexandra Schneider,et al.  Individual Daytime Noise Exposure during Routine Activities and Heart Rate Variability in Adults: A Repeated Measures Study , 2013, Environmental health perspectives.

[23]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[25]  P Nassiri,et al.  The effect of noise on human performance: a clinical trial. , 2013, The international journal of occupational and environmental medicine.

[26]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[27]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[29]  W. Kruskal,et al.  Use of Ranks in One-Criterion Variance Analysis , 1952 .

[30]  S. Wiens,et al.  Stress Recovery during Exposure to Nature Sound and Environmental Noise , 2010, International journal of environmental research and public health.

[31]  Shrikanth Narayanan,et al.  TILES-2018, a longitudinal physiologic and behavioral data set of hospital workers , 2020, Scientific data.

[32]  Mathieu Lagrange,et al.  Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[33]  Shrikanth Narayanan,et al.  Multi-Task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech , 2019, INTERSPEECH.

[34]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[35]  Ruxin Chen,et al.  Hierarchy-aware Loss Function on a Tree Structured Label Space for Audio Event Detection , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[36]  Shrikanth S. Narayanan,et al.  An Overview on Perceptually Motivated Audio Indexing and Classification , 2013, Proceedings of the IEEE.

[37]  Shrikanth Narayanan,et al.  Bluetooth Based Indoor Localization Using Triplet Embeddings , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Alice Eldridge,et al.  Emotional associations with soundscape reflect human-environment relationships , 2018 .

[39]  Thorsten Dickhaus,et al.  Simultaneous Statistical Inference , 2014, Springer Berlin Heidelberg.

[40]  Michael R. Templeton,et al.  Disinfection of waterborne coliform bacteria by neem oil. , 2009 .

[41]  C.-C. Jay Kuo,et al.  Where am I? Scene Recognition for Mobile Robots using Audio Features , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[42]  Heikki Huttunen,et al.  Polyphonic sound event detection using multi label deep neural networks , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[43]  V. Siskova,et al.  The effect of sound on job performance , 2013, 2013 IEEE International Conference on Industrial Engineering and Engineering Management.