Exploring Automatic Diagnosis of COVID-19 from Crowdsourced Respiratory Sound Data

Audio signals generated by the human body (e.g., sighs, breathing, heart, digestion, vibration sounds) have routinely been used by clinicians as indicators to diagnose disease or assess disease progression. Until recently, such signals were usually collected through manual auscultation at scheduled visits. Research has now started to use digital technology to gather bodily sounds (e.g., from digital stethoscopes) for cardiovascular or respiratory examination, which could then be used for automatic analysis. Some initial work shows promise in detecting diagnostic signals of COVID-19 from voice and coughs. In this paper we describe our data analysis over a large-scale crowdsourced dataset of respiratory sounds collected to aid diagnosis of COVID-19. We use coughs and breathing to understand how discernible COVID-19 sounds are from those in asthma or healthy controls. Our results show that even a simple binary machine learning classifier is able to classify correctly healthy and COVID-19 sounds. We also show how we distinguish a user who tested positive for COVID-19 and has a cough from a healthy user with a cough, and users who tested positive for COVID-19 and have a cough from users with asthma and a cough. Our models achieve an AUC of above 80% across all tasks. These results are preliminary and only scratch the surface of the potential of this type of data and audio-based machine learning. This work opens the door to further investigation of how automatically analysed respiratory patterns could be used as pre-screening signals to aid COVID-19 diagnosis.

[1]  Jiang Li,et al.  A Deep Transfer Learning Approach for Improved Post-Traumatic Stress Disorder Diagnosis , 2017, 2017 IEEE International Conference on Data Mining (ICDM).

[2]  Aren Jansen,et al.  CNN architectures for large-scale audio classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Gavin C. Cawley,et al.  On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation , 2010, J. Mach. Learn. Res..

[4]  Stuart A. Bowyer,et al.  Automatic adventitious respiratory sound analysis: A systematic review , 2017, PloS one.

[5]  Amir Lerman,et al.  Voice Signal Characteristics Are Independently Associated With Coronary Artery Disease , 2018, Mayo Clinic proceedings.

[6]  Daniel P. W. Ellis,et al.  Beat Tracking by Dynamic Programming , 2007 .

[7]  Gorkem Serbes,et al.  Analyzing the effectiveness of vocal features in early telediagnosis of Parkinson's disease , 2017, PloS one.

[8]  M. Jorge Cardoso,et al.  Real-time tracking of self-reported symptoms to predict potential COVID-19 , 2020, Nature Medicine.

[9]  Hasan Farooq,et al.  Can Machine Learning Be Used to Recognize and Diagnose Coughs? , 2020, 2020 International Conference on e-Health and Bioengineering (EHB).

[10]  Kun Qian,et al.  An Early Study on Intelligent Analysis of Speech under COVID-19: Severity, Sleep Quality, Fatigue, and Anxiety , 2020, INTERSPEECH.

[11]  Susan Pereira Ribeiro,et al.  p16INK4a Expression and Immunologic Aging in Chronic HIV Infection , 2016, PloS one.

[12]  Prasanta Kumar Ghosh,et al.  Coswara - A Database of Breathing, Cough, and Voice Sounds for COVID-19 Diagnosis , 2020, INTERSPEECH.

[13]  J. Bardram,et al.  Voice analysis as an objective state marker in bipolar disorder , 2016, Translational psychiatry.

[14]  Thomas F. Quatieri,et al.  A Framework for Biomarkers of COVID-19 Based on Coordination of Speech-Production Subsystems , 2020, IEEE Open Journal of Engineering in Medicine and Biology.

[15]  Irena Rektorova,et al.  Speech disorders in Parkinson’s disease: early diagnostics and effects of medication and brain stimulation , 2017, Journal of Neural Transmission.

[16]  Hojung Cha,et al.  Automatically characterizing places with opportunistic crowdsensing using smartphones , 2012, UbiComp.

[17]  Cheng-Ta Yang,et al.  Design of Wearable Breathing Sound Monitoring System for Real-Time Wheeze Detection , 2017, Sensors.

[18]  Wen Xu,et al.  The respiratory sound features of COVID-19 patients fill gaps between clinical data and screening methods , 2020, medRxiv.

[19]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[20]  Vedran Bilas,et al.  Energy-Efficient Respiratory Sounds Sensing for Personal Mobile Asthma Monitoring , 2016, IEEE Sensors Journal.

[21]  Shyamnath Gollakota,et al.  Contactless Sleep Apnea Detection on Smartphones , 2015, GetMobile Mob. Comput. Commun..

[22]  Gauri Deshpande,et al.  An Overview on Audio, Signal, Speech, & Language Processing for COVID-19 , 2020, arXiv.org.

[23]  Thomas Grill,et al.  Exploring Data Augmentation for Improved Singing Voice Detection with Neural Networks , 2015, ISMIR.

[24]  Renard Xaviero Adhi Pramono,et al.  A Cough-Based Algorithm for Automatic Diagnosis of Pertussis , 2016, PloS one.

[25]  Michael Rabadi,et al.  Kernel Methods for Machine Learning , 2015 .

[26]  Cecilia Mascolo,et al.  EmotionSense: a mobile phones based adaptive platform for experimental social psychology research , 2010, UbiComp.

[27]  Muhammad Nabeel,et al.  AI4COVID-19: AI enabled preliminary diagnosis for COVID-19 from cough samples via an app , 2020, Informatics in Medicine Unlocked.

[28]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.