Audio for Audio is Better? An Investigation on Transfer Learning Models for Heart Sound Classification

Cardiovascular disease is one of the leading factors for death cause of human beings. In the past decade, heart sound classification has been increasingly studied for its feasibility to develop a non-invasive approach to monitor a subject’s health status. Particularly, relevant studies have benefited from the fast development of wearable devices and machine learning techniques. Nevertheless, finding and designing efficient acoustic properties from heart sounds is an expensive and time-consuming task. It is known that transfer learning methods can help extract higher representations automatically from the heart sounds without any human domain knowledge. However, most existing studies are based on models pre-trained on images, which may not fully represent the characteristics inherited from audio. To this end, we propose a novel transfer learning model pre-trained on large scale audio data for a heart sound classification task. In this study, the PhysioNet CinC Challenge Dataset is used for evaluation. Experimental results demonstrate that, our proposed pre-trained audio models can outperform other popular models pre-trained by images by achieving the highest unweighted average recall at 89.7 %.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Zhuowen Tu,et al.  Aggregated Residual Transformations for Deep Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[4]  Panos Vardas,et al.  European Society of Cardiology: Cardiovascular Disease Statistics 2017. , 2018, European heart journal.

[5]  Dan Qu,et al.  Towards end-to-end speech recognition with transfer learning , 2018, EURASIP Journal on Audio, Speech, and Music Processing.

[6]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Mark D. Plumbley,et al.  PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Natasha Singh-Miller,et al.  Using spectral acoustic features to identify abnormal heart sounds , 2016, 2016 Computing in Cardiology Conference (CinC).

[10]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[11]  Quoc V. Le,et al.  Do Better ImageNet Models Transfer Better? , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Kumar Sricharan,et al.  Classifying heart sound recordings using deep convolutional neural networks and mel-frequency cepstral coefficients , 2016, 2016 Computing in Cardiology Conference (CinC).

[13]  Imran Siddiqi,et al.  Localization and classification of heart beats in phonocardiography signals —a comprehensive review , 2018, EURASIP Journal on Advances in Signal Processing.

[14]  Qiao Li,et al.  An open access database for the evaluation of heart sound algorithms , 2016, Physiological measurement.

[15]  Björn W. Schuller,et al.  Machine Listening for Heart Status Monitoring: Introducing and Benchmarking HSS—The Heart Sounds Shenzhen Corpus , 2019, IEEE Journal of Biomedical and Health Informatics.

[16]  Aditya Malte,et al.  Evolution of transfer learning in natural language processing , 2019, ArXiv.

[17]  S. Mangione,et al.  Cardiac auscultatory skills of physicians-in-training: a comparison of three English-speaking countries. , 2001, The American journal of medicine.

[18]  Kaiming He,et al.  Exploring the Limits of Weakly Supervised Pretraining , 2018, ECCV.

[19]  Björn W. Schuller,et al.  Learning Image-based Representations for Heart Sound Classification , 2018, DH.

[20]  Björn W. Schuller,et al.  The INTERSPEECH 2009 emotion challenge , 2009, INTERSPEECH.