SoundWatch: Exploring Smartwatch-based Deep Learning Approaches to Support Sound Awareness for Deaf and Hard of Hearing Users

Smartwatches have the potential to provide glanceable, always-available sound feedback to people who are deaf or hard of hearing. In this paper, we present a performance evaluation of four low-resource deep learning sound classification models: MobileNet, Inception, ResNet-lite, and VGG-lite across four device architectures: watch-only, watch+phone, watch+phone+cloud, and watch+cloud. While direct comparison with prior work is challenging, our results show that the best model, VGG-lite, performed similar to the state of the art for non-portable devices with an average accuracy of 81.2% (SD=5.8%) across 20 sound classes and 97.6% (SD=1.7%) across the three highest-priority sounds. For device architectures, we found that the watch+phone architecture provided the best balance between CPU, memory, network usage, and classification latency. Based on these experimental results, we built and conducted a qualitative lab evaluation of a smartwatch-based sound awareness app, called SoundWatch (Figure 1), with eight DHH participants. Qualitative findings show support for our sound awareness app but also uncover issues with misclassifications, latency, and privacy concerns. We close by offering design considerations for future wearable sound awareness technology.

[1]  Rainer Brück,et al.  Design and evaluation of a smartphone application for non-speech sound awareness for people with hearing loss , 2015, 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[2]  Trevor N. Mudge,et al.  Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge , 2017, ASPLOS.

[3]  Jon Froehlich,et al.  Exploring Augmented Reality Approaches to Real-Time Captioning: A Preliminary Autoethnographic Study , 2018, Conference on Designing Interactive Systems.

[4]  Xin Wang,et al.  UbiEar: Bringing Location-independent Sound Awareness to the Hard-of-hearing People with Smartphones , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[5]  Karyn L. Galvin,et al.  A comparison of a new prototype Tickle Talker with a Tactaid 7 , 2001 .

[6]  A R Thornton,et al.  Experience of using vibrotactile aids with the profoundly deafened. , 1994, European journal of disorders of communication : the journal of the College of Speech and Language Therapists, London.

[7]  M. Jeannette Lawler,et al.  Adoption Of ASL Classifiers As Delivered By Head-Mounted Displays In A Planetarium Show , 2015 .

[8]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[9]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10]  Diogo Almeida,et al.  Resnet in Resnet: Generalizing Residual Architectures , 2016, ArXiv.

[11]  James A. Landay,et al.  Can you see what i hear?: the design and evaluation of a peripheral sound display for the deaf , 2003, CHI '03.

[12]  Benjamin M. Gorman VisAural:: a wearable sound-localisation device for people with impaired hearing , 2014, ASSETS.

[13]  Hanfeng Yuan,et al.  Tactual display of consonant voicing to supplement lipreading , 2005, The Journal of the Acoustical Society of America.

[14]  Kenji Suzuki,et al.  Light-Emitting Device for Supporting Auditory Awareness of Hearing-Impaired People during Group Conversations , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[15]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[16]  Wolfgang Effelsberg,et al.  Automatic audio content analysis , 1997, MULTIMEDIA '96.

[17]  Lie Lu,et al.  Content analysis for audio classification and segmentation , 2002, IEEE Trans. Speech Audio Process..

[18]  JAN GINIS,et al.  A Comparison of a New Prototype , 2009 .

[19]  Rainer Brück,et al.  A Pilot Study about the Smartwatch as Assistive Device for Deaf People , 2015, ASSETS.

[20]  Richard E. Ladner,et al.  A Personalizable Mobile Sound Detector App Design for Deaf and Hard-of-Hearing Users , 2016, ASSETS.

[21]  Jon Froehlich,et al.  Towards Accessible Conversations in a Mobile Context for People who are Deaf and Hard of Hearing , 2018, ASSETS.

[22]  Jon Froehlich,et al.  Head-Mounted Display Visualizations to Support Sound Awareness for the Deaf and Hard of Hearing , 2015, CHI.

[23]  Kelly Mack,et al.  HomeSound: An Iterative Field Deployment of an In-Home Sound Awareness System for Deaf or Hard of Hearing Users , 2020, CHI.

[24]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  William R Vavrik,et al.  Measurement of Pavement Roughness Using Android-Based Smartphone Application , 2014 .

[26]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  Khurram Shahzad,et al.  A comparative study of in-sensor processing vs. raw data transmission using ZigBee, BLE and Wi-Fi for data intensive monitoring applications , 2014, 2014 11th International Symposium on Wireless Communications Systems (ISWCS).

[28]  Arthur Boothroyd,et al.  A Wearable Multichannel Tactile Display of Voice Fundamental Frequency , 1988, Ear and hearing.

[29]  Annamaria Mesaros,et al.  TUT Sound events 2016, Development dataset , 2016 .

[30]  Alec Wolman,et al.  MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints , 2016, MobiSys.

[31]  Jon Froehlich,et al.  Evaluating Smartwatch-based Sound Feedback for Deaf and Hard-of-hearing Users Across Contexts , 2020, CHI.

[32]  Frank A. Saunders,et al.  A wearable tactile sensory aid for profoundly deaf children , 2005, Journal of Medical Systems.

[33]  Gierad Laput,et al.  Ubicoustics: Plug-and-Play Acoustic Activity Recognition , 2018, UIST.

[34]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[35]  Francesc Alías,et al.  homeSound: Real-Time Audio Event Detection Based on High Performance Computing for Behaviour and Surveillance Remote Monitoring , 2017, Sensors.

[36]  Jon Froehlich,et al.  Deaf and Hard-of-hearing Individuals' Preferences for Wearable and Mobile Sound Awareness Technologies , 2019, CHI.

[37]  Angela Lin,et al.  Exploring Sound Awareness in the Home for People who are Deaf or Hard of Hearing , 2019, CHI.

[38]  Anish Arora,et al.  One Size Does Not Fit All: Multi-Scale, Cascaded RNNs for Radar Classification , 2019, BuildSys@SenSys.

[39]  Thomas Grechenig,et al.  Design Implications for a Ubiquitous Ambient Sound Display for the Deaf , 2007, CVHI.

[40]  Nicolai Petkov,et al.  Reliable detection of audio events in highly noisy environments , 2015, Pattern Recognit. Lett..

[41]  Tara Matthews,et al.  Visualizing non-speech sounds for the deaf , 2005, Assets '05.

[42]  P. van Kranenburg,et al.  International Society for Music Information Retrieval , 2014 .

[43]  Xavier Serra,et al.  Freesound Datasets: A Platform for the Creation of Open Audio Datasets , 2017, ISMIR.

[44]  Alvin Cheung,et al.  Perceptual Compression for Video Storage and Processing Systems , 2019, SoCC.

[45]  Tara Matthews,et al.  Evaluating non-speech sound visualizations for the deaf , 2006, Behav. Inf. Technol..