PTB-XL, a large publicly available electrocardiography dataset

Electrocardiography (ECG) is a key non-invasive diagnostic tool for cardiovascular diseases which is increasingly supported by algorithms based on machine learning. Major obstacles for the development of automatic ECG interpretation algorithms are both the lack of public datasets and well-defined benchmarking procedures to allow comparison s of different algorithms. To address these issues, we put forward PTB-XL , the to-date largest freely accessible clinical 12-lead ECG-waveform dataset comprising 21837 records from 18885 patients of 10 seconds length. The ECG-waveform data was annotated by up to two cardiologists as a multi-label dataset, where diagnostic labels were further aggregated into super and subclasses. The dataset covers a broad range of diagnostic classes including, in particular, a large fraction of healthy records. The combination with additional metadata on demographics, additional diagnostic statements, diagnosis likelihoods, manually annotated signal properties as well as suggested folds for splitting training and test sets turns the dataset into a rich resource for the development and the evaluation of automatic ECG interpretation algorithms. Measurement(s) electrocardiography • cardiovascular system Technology Type(s) 12 lead electrocardiography Factor Type(s) presence of co-occurring diseases Sample Characteristic - Organism Homo sapiens Machine-accessible metadata file describing the reported data: https://doi.org/10.6084/m9.figshare.12098055

[1]  Ralf Bousseljot,et al.  Nutzung der EKG-Signaldatenbank CARDIODAT der PTB über das Internet , 2009 .

[2]  Scott David Greenwald,et al.  The development and analysis of a ventricular fibrillation detector , 1986 .

[3]  Jeffrey M. Hausdorff,et al.  Physionet: Components of a New Research Resource for Complex Physiologic Signals". Circu-lation Vol , 2000 .

[4]  J. Couderc The telemetric and holter ECG warehouse initiative (THEW): A data repository for the design, implementation and validation of ECG-related technologies , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[5]  D. Kreiseler,et al.  Telemetric ECG diagnosis follow-up , 2003, Computers in Cardiology, 2003.

[6]  R. Bousseljot,et al.  Waveform recognition with 10,000 ECGs , 2000, Computers in Cardiology 2000. Vol.27 (Cat. 00CH37163).

[7]  Rickey E Carter,et al.  An artificial intelligence-enabled ECG algorithm for the identification of patients with atrial fibrillation during sinus rhythm: a retrospective analysis of outcome prediction , 2019, The Lancet.

[8]  C Zywietz,et al.  Common Standards for Quantitative Electrocardiography: Goals and Main Results , 1990, Methods of Information in Medicine.

[9]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[10]  Xia Liu,et al.  Chinese Cardiovascular Disease Database (CCDD) and Its Management Tool , 2010, 2010 IEEE International Conference on BioInformatics and BioEngineering.

[11]  Jun Zhu,et al.  Variations in common diseases, hospital admissions, and deaths in middle-aged adults in 21 countries from five continents (PURE): a prospective cohort study , 2020, The Lancet.

[12]  Shoushui Wei,et al.  An Open Access Database for Evaluating the Algorithms of Electrocardiogram Rhythm and Morphology Abnormality Detection , 2018, Journal of Medical Imaging and Health Informatics.

[13]  Grigorios Tsoumakas,et al.  On the Stratification of Multi-label Data , 2011, ECML/PKDD.

[14]  G.B. Moody,et al.  The impact of the MIT-BIH Arrhythmia Database , 2001, IEEE Engineering in Medicine and Biology Magazine.

[15]  E. W. Hancock,et al.  Recommendations for the standardization and interpretation of the electrocardiogram: part II: electrocardiography diagnostic statement list a scientific statement from the American Heart Association Electrocardiography and Arrhythmias Committee, Council on Clinical Cardiology; the American College o , 2007, Journal of the American College of Cardiology.

[16]  H. Wellens,et al.  Computer-Interpreted Electrocardiograms: Benefits and Limitations. , 2017, Journal of the American College of Cardiology.

[17]  V. Preedy,et al.  Prospective Cohort Study , 2010 .

[18]  Masoumeh Haghpanahi,et al.  Cardiologist-level arrhythmia detection and classification in ambulatory electrocardiograms using a deep neural network , 2019, Nature Medicine.

[19]  G. Moody,et al.  The European ST-T database: standard for evaluating systems for the analysis of ST-T changes in ambulatory electrocardiography. , 1992, European heart journal.

[20]  Qiao Li,et al.  AF classification from a short single lead ECG recording: The PhysioNet/computing in cardiology challenge 2017 , 2017, 2017 Computing in Cardiology (CinC).

[21]  R. Bousseljot,et al.  ECG signal pattern comparison via Internet , 2001, Computers in Cardiology 2001. Vol.28 (Cat. No.01CH37287).

[22]  R. Bousseljot,et al.  Ergebnisse der EKG-Interpretation mittels Signalmustererkennung , 2000, Herzschrittmachertherapie + Elektrophysiologie.

[23]  D. Kreiseler,et al.  Two probabilistic methods to characterize and link drug related ECG changes to diagnoses from the PTB database: Results with Moxifloxacin , 2008, 2008 Computers in Cardiology.