Deep Learning for ECG Analysis: Benchmarks and Insights from PTB-XL

Electrocardiography (ECG) is a very common, non-invasive diagnostic procedure and its interpretation is increasingly supported by algorithms. The progress in the field of automatic ECG analysis has up to now been hampered by a lack of appropriate datasets for training as well as a lack of well-defined evaluation procedures to ensure comparability of different algorithms. To alleviate these issues, we put forward first benchmarking results for the recently published, freely accessible clinical 12-lead ECG dataset PTB-XL, covering a variety of tasks from different ECG statement prediction tasks to age and sex prediction. Among the investigated deep-learning-based timeseries classification algorithms, we find that convolutional neural networks, in particular resnet- and inception-based architectures, show the strongest performance across all tasks. We find consistent results on the ICBEB2018 challenge ECG dataset and discuss prospects of transfer learning using classifiers pretrained on PTB-XL. These benchmarking results are complemented by deeper insights into the classification algorithm in terms of hidden stratification, model uncertainty and an exploratory interpretability analysis, which provide connecting points for future research on the dataset. Our results emphasize the prospects of deep-learning-based algorithms in the field of ECG analysis, not only in terms of quantitative accuracy but also in terms of clinically equally important further quality metrics such as uncertainty quantification and interpretability. With this resource, we aim to establish the PTB-XL dataset as a resource for structured benchmarking of ECG analysis algorithms and encourage other researchers in the field to join these efforts.

[1]  Michael J. Ackerman,et al.  Novel Bloodless Potassium Determination Using a Signal‐Processed Single‐Lead ECG , 2016, Journal of the American Heart Association.

[2]  M. Malik,et al.  QT/RR curvatures in healthy subjects: sex differences and covariates. , 2013, American journal of physiology. Heart and circulatory physiology.

[3]  Wojciech Samek,et al.  PTB-XL, a large publicly available electrocardiography dataset , 2020, Scientific Data.

[4]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[5]  Guy Salama,et al.  Sex differences in the mechanisms underlying long QT syndrome. , 2014, American journal of physiology. Heart and circulatory physiology.

[6]  Nils Strodthoff,et al.  Detecting and interpreting myocardial infarction using fully convolutional neural networks , 2018, Physiological measurement.

[7]  Zhi Zhang,et al.  Bag of Tricks for Image Classification with Convolutional Neural Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Ramesh Kumar Sunkaria,et al.  Inferior myocardial infarction detection using stationary wavelet transform and machine learning approach , 2017, Signal, Image and Video Processing.

[9]  Tapio Salakoski,et al.  An expanded evaluation of protein function prediction methods shows an improvement in accuracy , 2016, Genome Biology.

[10]  Zhi-Hua Zhou,et al.  A Unified View of Multi-Label Performance Measures , 2016, ICML.

[11]  Geoffrey I. Webb,et al.  InceptionTime: Finding AlexNet for time series classification , 2019, Data Mining and Knowledge Discovery.

[12]  Yixin Chen,et al.  Multi-Scale Convolutional Neural Networks for Time Series Classification , 2016, ArXiv.

[13]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[14]  James Large,et al.  The Great Time Series Classification Bake Off: An Experimental Evaluation of Recently Proposed Algorithms. Extended Version , 2016, ArXiv.

[15]  Aaron O'Leary,et al.  PyWavelets: A Python package for wavelet analysis , 2019, J. Open Source Softw..

[16]  Gary L. Wells,et al.  Measuring Psychological Uncertainty : Verbal Versus Numeric Methods , 2004 .

[17]  Cyril Rakovski,et al.  A 12-lead electrocardiogram database for arrhythmia research covering more than 10,000 patients , 2020, Scientific Data.

[18]  H. Wellens,et al.  Computer-Interpreted Electrocardiograms: Benefits and Limitations. , 2017, Journal of the American College of Cardiology.

[19]  Wojciech Samek,et al.  Explainable AI: Interpreting, Explaining and Visualizing Deep Learning , 2019, Explainable AI.

[20]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[21]  Tiago H. Falk,et al.  Deep learning-based electroencephalography analysis: a systematic review , 2019, Journal of neural engineering.

[22]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[23]  Rickey E Carter,et al.  Age and Sex Estimation Using Artificial Intelligence From Standard 12-Lead ECGs , 2019, Circulation. Arrhythmia and electrophysiology.

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Shoushui Wei,et al.  An Open Access Database for Evaluating the Algorithms of Electrocardiogram Rhythm and Morphology Abnormality Detection , 2018, Journal of Medical Imaging and Health Informatics.

[26]  Aboul Ella Hassanien,et al.  ECG signals classification: a review , 2017, Int. J. Intell. Eng. Informatics.

[27]  J. Rapin,et al.  A deep neural network learning algorithm outperforms a conventional algorithm for emergency department electrocardiogram interpretation. , 2019, Journal of electrocardiology.

[28]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[29]  Leslie N. Smith,et al.  A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay , 2018, ArXiv.

[30]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[31]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[32]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[33]  Alexander Binder,et al.  On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation , 2015, PloS one.

[34]  Jun Zhu,et al.  Variations in common diseases, hospital admissions, and deaths in middle-aged adults in 21 countries from five continents (PURE): a prospective cohort study , 2020, The Lancet.

[35]  Tim Oates,et al.  Time series classification from scratch with deep neural networks: A strong baseline , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[36]  Rodrigo C. Barros,et al.  Hierarchical Multi-Label Classification Networks , 2018, ICML.

[37]  Wolfram Burgard,et al.  Deep learning with convolutional neural networks for EEG decoding and visualization , 2017, Human brain mapping.

[38]  John Cristian Borges Gamboa,et al.  Deep Learning for Time-Series Analysis , 2017, ArXiv.

[39]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[40]  Joan Lasenby,et al.  Techniques for visualizing LSTMs applied to electrocardiograms , 2017 .

[41]  S. Salerno,et al.  Competency in Interpretation of 12-Lead Electrocardiograms: A Summary and Appraisal of Published Evidence , 2003, Annals of Internal Medicine.

[42]  Masoumeh Haghpanahi,et al.  Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network , 2019, Nature Medicine.

[43]  Makani Purva,et al.  Teaching the interpretation of electrocardiograms: which method is best? , 2015, Journal of electrocardiology.

[44]  Jari Björne,et al.  The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens , 2019, Genome Biology.

[45]  Ohhwan Kwon,et al.  Electrocardiogram Sampling Frequency Range Acceptable for Heart Rate Variability Analysis , 2018, Healthcare informatics research.

[46]  Frank Hutter,et al.  Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[47]  Gustavo Carneiro,et al.  Hidden stratification causes clinically meaningful failures in machine learning for medical imaging , 2019, CHIL.

[48]  Mohanasankar Sivaprakasam,et al.  Interpreting Deep Neural Networks for Single-Lead ECG Arrhythmia Classification , 2020, 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC).

[49]  Michael G. Strintzis,et al.  ECG pattern recognition and classification using non-linear transformations and neural networks: A review , 1998, Int. J. Medical Informatics.

[50]  Charles Blundell,et al.  Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles , 2016, NIPS.

[51]  Alan H. Feiveson,et al.  Predicting “Heart Age” Using Electrocardiography , 2014, Journal of personalized medicine.

[52]  Rickey E. Carter,et al.  Screening for cardiac contractile dysfunction using an artificial intelligence–enabled electrocardiogram , 2019, Nature Medicine.

[53]  Peter C Austin,et al.  A brief note on overlapping confidence intervals. , 2002, Journal of vascular surgery.

[54]  Thomas B. Schön,et al.  Automatic diagnosis of the 12-lead ECG using a deep neural network , 2020, Nature Communications.

[55]  Germain Forestier,et al.  Deep learning for time series classification: a review , 2018, Data Mining and Knowledge Discovery.

[56]  Jimeng Sun,et al.  Opportunities and Challenges in Deep Learning Methods on Electrocardiogram Data: A Systematic Review , 2020, ArXiv.

[57]  Ralf Bousseljot,et al.  Nutzung der EKG-Signaldatenbank CARDIODAT der PTB über das Internet , 2009 .

[58]  Rickey E Carter,et al.  An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction , 2019, The Lancet.