Application of statistics and machine learning for risk stratification of heritable cardiac arrhythmias

In the clinical management of heritable cardiac arrhythmias (HCAs), risk stratification is of prime importance. The ability to predict the likelihood of individuals within a sub-population contracting a pathology potentially resulting in sudden death gives subjects the opportunity to put preventive measures in place, and make the necessary lifestyle adjustments to increase their chances of survival. In this paper, we review classical methods that have commonly been used in clinical studies for risk stratification in HCA, such as odds ratios, hazard ratios, Chi-squared tests, and logistic regression, discussing their benefits and shortcomings. We then explore less common and more recent statistical and machine learning methods adopted by other biological studies and assess their applicability in the study of HCA. These methods typically support the multivariate analysis of risk factors, such as decision trees, neural networks, support vector machines and Bayesian classifiers. They have been adopted for feature selection of predictor variables in risk stratification studies, and in some cases, prove better than classical methods.

[1]  Eric Boerwinkle,et al.  Application of machine learning algorithms to predict coronary artery calcification with a sibship‐based design , 2008, Genetic epidemiology.

[2]  William Durfee,et al.  Engineering evaluation of the energy-storing orthosis FES gait system , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[3]  Jack Y. Yang,et al.  A comparative study of different machine learning methods on microarray gene expression data , 2008, BMC Genomics.

[4]  A. Manolis,et al.  Novel sodium channel SCN5A mutations in Brugada syndrome patients from Greece. , 2010, International journal of cardiology.

[5]  Charles Antzelevitch Brugada syndrome. , 2006, Pacing and clinical electrophysiology : PACE.

[6]  M. Daly,et al.  Genome-wide association studies for common diseases and complex traits , 2005, Nature Reviews Genetics.

[7]  Johan van der Lei,et al.  Antipsychotics and the risk of sudden cardiac death. , 2004, Archives of internal medicine.

[8]  E. Behr,et al.  Low prevalence of risk markers in cases of sudden death due to Brugada syndrome relevance to risk stratification in Brugada syndrome. , 2011, Journal of the American College of Cardiology.

[9]  M Borggrefe,et al.  Long-Term Prognosis of Patients Diagnosed With Brugada Syndrome: Results From the FINGER Brugada Syndrome Registry , 2010, Circulation.

[10]  T. Arentz,et al.  Long-term prognosis of asymptomatic individuals with spontaneous or drug-induced type 1 electrocardiographic phenotype of Brugada syndrome. , 2011, Journal of electrocardiology.

[11]  Abraham T. Mathew,et al.  Fuzzy Clustered Probabilistic and Multi Layered Feed Forward Neural Networks for Electrocardiogram Arrhythmia Classification , 2011, Journal of Medical Systems.

[12]  Olivier Meste,et al.  Temporal and spectral analysis of ventricular fibrillation in humans , 2011, Journal of Interventional Cardiac Electrophysiology.

[13]  H. Cordell Estimation and testing of gene-environment interactions in family-based association studies. , 2009, Genomics.

[14]  A. Akobeng,et al.  Understanding diagnostic tests 3: receiver operating characteristic curves , 2007, Acta paediatrica.

[15]  Michael J. Ackerman,et al.  Risk for life-threatening cardiac events in patients with genotype-confirmed long-QT syndrome and normal-range corrected QT intervals. , 2011, Journal of the American College of Cardiology.

[16]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[17]  G. Toffolo,et al.  Artificial neural networks and robust Bayesian classifiers for risk stratification following uncomplicated myocardial infarction. , 2005, International journal of cardiology.

[18]  I. König,et al.  Picking single-nucleotide polymorphisms in forests , 2007, BMC proceedings.

[19]  Blaz Zupan,et al.  Orange: From Experimental Machine Learning to Interactive Data Mining , 2004, PKDD.

[20]  J. Towbin,et al.  An international compendium of mutations in the SCN5A-encoded cardiac sodium channel in patients referred for Brugada syndrome genetic testing. , 2010, Heart rhythm.

[21]  S. Priori,et al.  Natural History of Brugada Syndrome: Insights for Risk Stratification and Management , 2002, Circulation.

[22]  João Maroco,et al.  Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests , 2011, BMC Research Notes.

[23]  Yan V. Sun,et al.  Classification of rheumatoid arthritis status with candidate gene and genome-wide single-nucleotide polymorphisms using random forests , 2007, BMC proceedings.

[24]  Qiang Yang,et al.  MegaSNPHunter: a learning approach to detect disease predisposition SNPs and high level interactions in genome wide association study , 2009, BMC Bioinformatics.

[25]  Mehmet Engin,et al.  Early prostate cancer diagnosis by using artificial neural networks and support vector machines , 2009, Expert Syst. Appl..

[26]  Wojciech Zareba,et al.  Risk Stratification for Arrhythmic Sudden Cardiac Death: Identifying the Roadblocks , 2011, Circulation.

[27]  D. Clayton Prediction and Interaction in Complex Disease Genetics: Experience in Type 1 Diabetes , 2009, PLoS genetics.

[28]  Adrian F. M. Smith,et al.  Evidence-based medicine as Bayesian decision-making. , 2000, Statistics in medicine.

[29]  Ricardo Cao,et al.  Evaluating the Ability of Tree‐Based Methods and Logistic Regression for the Detection of SNP‐SNP Interaction , 2009, Annals of human genetics.

[30]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[31]  D. Roden,et al.  Mutations in Sodium Channel β1- and β2-Subunits Associated With Atrial Fibrillation , 2009, Circulation. Arrhythmia and electrophysiology.

[32]  S. Folstein,et al.  Localization of the huntington's disease gene to a small segment of chromosome 4 flanked by D4S10 and the telomere , 1987, Cell.

[33]  Dimitrios I. Fotiadis,et al.  Using partial decision trees to predict Parkinson's symptoms: A new approach for diagnosis and therapy in patients suffering from Parkinson's disease , 2012, Comput. Biol. Medicine.

[34]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[35]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[36]  M. Borggrefe,et al.  Risk Stratification in Electrical Cardiomyopathies , 2009, Herz Kardiovaskuläre Erkrankungen.

[37]  D. Tamborero,et al.  Gender differences in clinical manifestations of Brugada syndrome. , 2008, Journal of the American College of Cardiology.

[38]  Matthew R. Pocock,et al.  Taverna: a tool for the composition and enactment of bioinformatics workflows , 2004, Bioinform..

[39]  S. Priori,et al.  Phenotypical Manifestations of Mutations in the Genes Encoding Subunits of the Cardiac Voltage-Dependent L-type Calcium Channel , 2011 .

[40]  F. Gaita,et al.  Risk stratification of the patients with Brugada type electrocardiogram: a community-based prospective study. , 2008, Europace : European pacing, arrhythmias, and cardiac electrophysiology : journal of the working groups on cardiac pacing, arrhythmias, and cardiac cellular electrophysiology of the European Society of Cardiology.

[41]  R. Brugada Clinical approach to sudden cardiac death syndromes , 2010 .

[42]  A. Hofman,et al.  Identification of a common variant at the NOS1AP locus strongly associated to QT-interval duration. , 2008, Human molecular genetics.

[43]  Y. Yokoyama,et al.  Clinical Characteristics and Risk Stratification in Symptomatic and Asymptomatic Patients with Brugada Syndrome: Multicenter Study in Japan , 2007, Journal of cardiovascular electrophysiology.

[44]  Bertram Ludäscher,et al.  Kepler: an extensible system for design and execution of scientific workflows , 2004 .

[45]  Christian Gieger,et al.  Common variants at ten loci modulate the QT interval duration in the QTSCD Study , 2009, Nature Genetics.

[46]  Pierre Roussel-Ragot,et al.  Arrhythmia Discrimination in Implantable Cardioverter Defibrillators Using Support Vector Machines Applied to a New Representation of Electrograms , 2011, IEEE Transactions on Biomedical Engineering.

[47]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[48]  L. Fleisher,et al.  Risk stratification. , 2008, Best practice & research. Clinical anaesthesiology.

[49]  Christian Gieger,et al.  A common genetic variant in the NOS1 regulator NOS1AP modulates cardiac repolarization , 2006, Nature Genetics.

[50]  Mitchell W Krucoff,et al.  Detection of QT prolongation using a novel electrocardiographic analysis algorithm applying intelligent automation: prospective blinded evaluation using the Cardiac Safety Research Consortium electrocardiographic database. , 2012, American heart journal.

[51]  Y. Aizawa,et al.  Incidence and Initial Characteristics of Pilsicainide‐Induced Ventricular Arrhythmias in Patients With Brugada Syndrome , 2007, Pacing and clinical electrophysiology : PACE.

[52]  A. Akobeng Understanding diagnostic tests 2: likelihood ratios, pre‐ and post‐test probabilities and their use in clinical practice , 2007, Acta paediatrica.

[53]  Tony R. Martinez,et al.  Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[54]  The International HapMap Consortium,et al.  A physical map of the human genome , 2001 .

[55]  P. Elliott,et al.  Sudden death in hypertrophic cardiomyopathy: identification of high risk patients. , 2000, Journal of the American College of Cardiology.

[56]  Takehito Tokuyama,et al.  A spontaneous Type 1 electrocardiogram pattern in lead V2 is an independent predictor of ventricular fibrillation in Brugada syndrome. , 2010, Europace : European pacing, arrhythmias, and cardiac electrophysiology : journal of the working groups on cardiac pacing, arrhythmias, and cardiac cellular electrophysiology of the European Society of Cardiology.

[57]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[58]  A. Hofman,et al.  Genetic variation in NOS1AP is associated with sudden cardiac death: evidence from the Rotterdam Study. , 2009, Human molecular genetics.

[59]  O. J. Vrieze,et al.  Kohonen Network , 1995, Artificial Neural Networks.

[60]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[61]  Calum A MacRae,et al.  Risk stratification in the long-QT syndrome. , 2003, The New England journal of medicine.

[62]  E. Capriotti,et al.  Functional annotations improve the predictive score of human disease‐related mutations in proteins , 2009, Human mutation.

[63]  T. Ikeda,et al.  Noninvasive Risk Stratification of Subjects with a Brugada‐Type Electrocardiogram and No History of Cardiac Arrest , 2005, Annals of noninvasive electrocardiology : the official journal of the International Society for Holter and Noninvasive Electrocardiology, Inc.

[64]  Donald L Riddle,et al.  The Physical Map , 1997 .

[65]  L. C. Jain,et al.  1996 Australian and New Zealand Conference on Intelligent Information Systems proceedings : ANZIIS 96, Adelaide, South Austaralia, 18-20, November, 1996 , 1996 .

[66]  Chandan Chakraborty,et al.  Automated Screening of Arrhythmia Using Wavelet Based Machine Learning Techniques , 2012, Journal of Medical Systems.

[67]  G C Morton,et al.  Early prostate cancer. , 2000, Current problems in cancer.

[68]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[69]  Hiroaki Tatsumi,et al.  Risk Stratification in Patients with Brugada Syndrome: Analysis of Daily Fluctuations in 12‐Lead Electrocardiogram (ECG) and Signal‐Averaged Electrocardiogram (SAECG) , 2006, Journal of cardiovascular electrophysiology.

[70]  R. Brugada,et al.  The Long QT Syndrome , 2010 .

[71]  A. Leenhardt,et al.  MOG1: A New Susceptibility Gene for Brugada Syndrome , 2011, Circulation. Cardiovascular genetics.

[72]  P Ducimetière,et al.  Predicting sudden death in the population: the Paris Prospective Study I. , 1999, Circulation.

[73]  C Guérot,et al.  Resting heart rate as a predictive risk factor for sudden death in middle-aged men. , 2001, Cardiovascular research.

[74]  Jon Atli Benediktsson,et al.  Multiple Classifier Systems , 2015, Lecture Notes in Computer Science.

[75]  M. Yoshiyama,et al.  Conduction Delay in Right Ventricle as a Marker for Identifying High‐Risk Patients With Brugada Syndrome , 2009, Journal of cardiovascular electrophysiology.

[76]  M. Tao,et al.  Single nucleotide polymorphisms of the SCN5A gene in Han Chinese and their relation with Brugada syndrome. , 2004, Chinese medical journal.

[77]  Yan V. Sun,et al.  Machine learning in genome‐wide association studies , 2009, Genetic epidemiology.

[78]  A. Roli Artificial Neural Networks , 2012, Lecture Notes in Computer Science.

[79]  H. Cordell Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. , 2002, Human molecular genetics.

[80]  Carlo Napolitano,et al.  Risk stratification in the long-QT syndrome. , 2003, The New England journal of medicine.

[81]  Carlo Napolitano,et al.  Risk stratification in the long-QT syndrome. , 2003, The New England journal of medicine.

[82]  C. Apte,et al.  Data mining with decision trees and decision rules , 1997, Future Gener. Comput. Syst..

[83]  J. Brugada,et al.  Early risk stratification of patients with cardiogenic shock complicating acute myocardial infarction who undergo percutaneous coronary intervention. , 2009, The American journal of cardiology.

[84]  Michael J Ackerman,et al.  HRS/EHRA expert consensus statement on the state of genetic testing for the channelopathies and cardiomyopathies: this document was developed as a partnership between the Heart Rhythm Society (HRS) and the European Heart Rhythm Association (EHRA). , 2011, Europace : European pacing, arrhythmias, and cardiac electrophysiology : journal of the working groups on cardiac pacing, arrhythmias, and cardiac cellular electrophysiology of the European Society of Cardiology.

[85]  Nesma Settouti,et al.  Evolving neural networks using a genetic algorithm for heartbeat classification , 2011, Journal of medical engineering & technology.

[86]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[87]  D. Thomas,et al.  Biological models and statistical interactions: an example from multistage carcinogenesis. , 1981, International journal of epidemiology.

[88]  S. Priori,et al.  Sudden cardiac death and genetic ion channelopathies: long QT, Brugada, short QT, catecholaminergic polymorphic ventricular tachycardia, and idiopathic ventricular fibrillation. , 2012, Circulation.

[89]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[90]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[91]  A. Nekrutenko,et al.  Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences , 2010, Genome Biology.

[92]  A. Akobeng Understanding diagnostic tests 1: sensitivity, specificity and predictive values , 2007, Acta paediatrica.

[93]  Derick R. Peterson,et al.  Risk Factors for Aborted Cardiac Arrest and Sudden Cardiac Death in Children With the Congenital Long-QT Syndrome , 2008, Circulation.

[94]  A. Hofman,et al.  A common NOS1AP genetic polymorphism is associated with increased cardiovascular mortality in users of dihydropyridine calcium channel blockers. , 2009, British journal of clinical pharmacology.

[95]  Cicilia R. M. Leite,et al.  Classification of cardiac arrhythmias using competitive networks , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[96]  K. Morik,et al.  Accurate prediction of neuroblastoma outcome based on miRNA expression profiles , 2010, International journal of cancer.

[97]  RamonBrugada,et al.  Determinants of Sudden Cardiac Death in Individuals With the Electrocardiographic Pattern of Brugada Syndrome and No Previous Cardiac Arrest , 2003 .

[98]  Ataollah Ebrahimzadeh,et al.  Classification of the electrocardiogram signals using supervised classifiers and efficient features , 2010, Comput. Methods Programs Biomed..

[99]  E. Behr,et al.  Sudden arrhythmic death syndrome: familial evaluation identifies inheritable heart disease in the majority of families. , 2008, European heart journal.

[100]  C.-C. Jay Kuo,et al.  Sequence-based prioritization of nonsynonymous single-nucleotide polymorphisms for the study of disease mutations. , 2007, American journal of human genetics.

[101]  Stephan Beck,et al.  Advances in epigenome-wide association studies for common diseases , 2014, Trends in molecular medicine.

[102]  Z. EduardoContreras,et al.  Long QT syndrome , 2008 .

[103]  Fabio Stella,et al.  Classification of dendritic cell phenotypes from gene expression data , 2011, BMC Immunology.

[104]  R. Jankovic,et al.  Mutations in Sodium Channel Gene SCN9A and the Pain Perception Disorders , 2015 .

[105]  David R. Westhead,et al.  A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function , 2003, Bioinform..

[106]  B. Yegnanarayana,et al.  Artificial Neural Networks , 2004 .

[107]  H. Calkins,et al.  HRS/EHRA expert consensus statement on the state of genetic testing for the channelopathies and cardiomyopathies this document was developed as a partnership between the Heart Rhythm Society (HRS) and the European Heart Rhythm Association (EHRA). , 2011, Heart rhythm.

[108]  P. Lambiase,et al.  Brugada syndrome: Controversies in Risk stratification and Management , 2010, Indian pacing and electrophysiology journal.

[109]  Daryl R. Kipke,et al.  Annual International Conference of the IEEE Engineering in Medicine and Biology - Proceedings , 2005 .

[110]  A. Morris,et al.  Data quality control in genetic case-control association studies , 2010, Nature Protocols.

[111]  Evon M. O. Abu-Taieh,et al.  Comparative Study , 2020, Definitions.

[112]  J. Rice Understanding diagnostic tests. , 1991, Nursing.

[113]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[114]  Roland Eils,et al.  Prediction of clinical outcome and biological characterization of neuroblastoma by expression profiling , 2004 .

[115]  Zeeshan Syed,et al.  Unsupervised Similarity-Based Risk Stratification for Cardiovascular Events Using Long-Term Time-Series Data , 2011, J. Mach. Learn. Res..

[116]  J. Moult,et al.  Identification and analysis of deleterious human SNPs. , 2006, Journal of molecular biology.

[117]  Mevlut Ture,et al.  Using Kaplan-Meier analysis together with decision tree methods (C&RT, CHAID, QUEST, C4.5 and ID3) in determining recurrence-free survival of breast cancer patients , 2009, Expert Syst. Appl..

[118]  Stphane Tuffry,et al.  Data Mining and Statistics for Decision Making , 2011 .

[119]  Florence Koeppel,et al.  Quickly finding a needle in a haystack: a new automated cardiac arrhythmia detection software for preclinical studies. , 2012, Journal of pharmacological and toxicological methods.

[120]  F. Gaita,et al.  Risk stratification in individuals with the Brugada type 1 ECG pattern without previous cardiac arrest: usefulness of a combined clinical and electrophysiologic approach , 2010, European heart journal.

[121]  P. Friederich,et al.  Long-QT-Syndrom , 2015, Der Anaesthesist.

[122]  L. Tsui,et al.  Erratum: Identification of the Cystic Fibrosis Gene: Genetic Analysis , 1989, Science.

[123]  Mark R. Segal,et al.  Machine Learning Benchmarks and Random Forest Regression , 2004 .

[124]  PierreDucimetière,et al.  Predicting Sudden Death in the Population , 1999 .

[125]  Josef Kittler,et al.  Multiple Classifier Systems , 2004, Lecture Notes in Computer Science.

[126]  N. Hagiwara,et al.  Assessment of Markers for Identifying Patients at Risk for Life‐Threatening Arrhythmic Events in Brugada Syndrome , 2005, Journal of cardiovascular electrophysiology.

[127]  A. Aslani,et al.  Significance of cardiac autonomic neuropathy in risk stratification of Brugada syndrome. , 2008, Europace : European pacing, arrhythmias, and cardiac electrophysiology : journal of the working groups on cardiac pacing, arrhythmias, and cardiac cellular electrophysiology of the European Society of Cardiology.

[128]  David M. Reif,et al.  Machine Learning for Detecting Gene-Gene Interactions , 2006, Applied bioinformatics.

[129]  D. Balding A tutorial on statistical methods for population association studies , 2006, Nature Reviews Genetics.

[130]  J J Bailey,et al.  Utility of current risk stratification tests for predicting major arrhythmic events after myocardial infarction. , 2001, Journal of the American College of Cardiology.

[131]  Friedberg Ck Computers in cardiology. , 1970 .

[132]  Herbert F. Voigt,et al.  IEEE Engineering in Medicine and Biology Society , 2019, IEEE Transactions on Biomedical Engineering.

[133]  R. Brugada,et al.  Phenotypical Manifestations of Mutations in the Genes Encoding Subunits of the Cardiac Voltage – Dependent L-Type Calcium Channel , 2011 .

[134]  Michael J Ackerman,et al.  Risk of aborted cardiac arrest or sudden cardiac death during adolescence in the long-QT syndrome. , 2006, JAMA.

[135]  Christine W. Duarte,et al.  High-dimensional pharmacogenetic prediction of a continuous trait using machine learning techniques with application to warfarin dose prediction in African Americans , 2011, Bioinform..

[136]  Y. Ro Sudden Cardiac Death , 1998 .

[137]  Thorsten Meinl,et al.  KNIME: The Konstanz Information Miner , 2007, GfKl.

[138]  L. Cardon,et al.  Designing candidate gene and genome-wide case–control association studies , 2007, Nature Protocols.

[139]  Bülent Yilmaz,et al.  Feasibility of probabilistic neural networks, Kohonen self‐organizing maps and fuzzy clustering for source localization of ventricular focal arrhythmias from intravenous catheter measurements , 2009, Expert Syst. J. Knowl. Eng..

[140]  J. A. Gomes,et al.  Risk Stratification of Individuals with the Brugada Electrocardiogram: A Meta‐Analysis , 2006, Journal of cardiovascular electrophysiology.

[141]  Albert Y. Zomaya,et al.  A Review of Ensemble Methods in Bioinformatics , 2010, Current Bioinformatics.

[142]  P. C. Viswanathan,et al.  Mutation in Glycerol-3-Phosphate Dehydrogenase 1–Like Gene (GPD1-L) Decreases Cardiac Na+ Current and Causes Inherited Arrhythmias , 2007, Circulation.