Virtual genetic diagnosis for familial hypercholesterolemia powered by machine learning

Aims Familial hypercholesterolemia (FH) is the most common genetic disorder of lipid metabolism. The gold standard for FH diagnosis is genetic testing, available, however, only in selected university hospitals. Clinical scores – for example, the Dutch Lipid Score – are often employed as alternative, more accessible, albeit less accurate FH diagnostic tools. The aim of this study is to obtain a more reliable approach to FH diagnosis by a “virtual” genetic test using machine-learning approaches. Methods and results We used three machine-learning algorithms (a classification tree (CT), a gradient boosting machine (GBM), a neural network (NN)) to predict the presence of FH-causative genetic mutations in two independent FH cohorts: the FH Gothenburg cohort (split into training data (N = 174) and internal test (N = 74)) and the FH-CEGP Milan cohort (external test, N = 364). By evaluating their area under the receiver operating characteristic (AUROC) curves, we found that the three machine-learning algorithms performed better (AUROC 0.79 (CT), 0.83 (GBM), and 0.83 (NN) on the Gothenburg cohort, and 0.70 (CT), 0.78 (GBM), and 0.76 (NN) on the Milan cohort) than the clinical Dutch Lipid Score (AUROC 0.68 and 0.64 on the Gothenburg and Milan cohorts, respectively) in predicting carriers of FH-causative mutations. Conclusion In the diagnosis of FH-causative genetic mutations, all three machine-learning approaches we have tested outperform the Dutch Lipid Score, which is the clinical standard. We expect these machine-learning algorithms to provide the tools to implement a virtual genetic test of FH. These tools might prove particularly important for lipid clinics without access to genetic testing.

[1]  G. Biolo,et al.  Evaluation of the performance of Dutch Lipid Clinic Network score in an Italian FH population: The LIPIGEN study. , 2018, Atherosclerosis.

[2]  Fei Wang,et al.  Deep learning for healthcare: review, opportunities and challenges , 2018, Briefings Bioinform..

[3]  J. Rivière,et al.  Genetic testing for familial hypercholesterolemia: Impact on diagnosis, treatment and cardiovascular risk , 2019, European journal of preventive cardiology.

[4]  M. Eriksson,et al.  Influence of age on the metabolism of plasma low density lipoproteins in healthy males. , 1991, The Journal of clinical investigation.

[5]  J. Kastelein,et al.  Advanced method for the identification of patients with inherited hypercholesterolemia. , 2004, Seminars in vascular medicine.

[6]  L. Calabresi,et al.  Individuals with familial hypercholesterolemia and cardiovascular events have higher circulating Lp(a) levels. , 2019, Journal of clinical lipidology.

[7]  S. Humphries,et al.  Familial hypercholesterolaemia: summary of NICE guidance , 2008, BMJ : British Medical Journal.

[8]  John F. Robinson,et al.  Targeted next-generation sequencing in monogenic dyslipidemias , 2015, Current opinion in lipidology.

[9]  G. Watts,et al.  Health literacy in familial hypercholesterolemia: A cross-national study , 2018, European journal of preventive cardiology.

[10]  M. Arca,et al.  Spectrum of mutations in Italian patients with familial hypercholesterolemia: New results from the LIPIGEN study. , 2017, Atherosclerosis. Supplements.

[11]  Anne E Carpenter,et al.  Opportunities and obstacles for deep learning in biology and medicine , 2017, bioRxiv.

[12]  G. Watts,et al.  Familial hypercholesterolemia in the danish general population: prevalence, coronary artery disease, and cholesterol-lowering medication. , 2012, The Journal of clinical endocrinology and metabolism.

[13]  Nigam H. Shah,et al.  Finding missed cases of familial hypercholesterolemia in health systems using machine learning , 2019, npj Digital Medicine.

[14]  D. Hosmer,et al.  Goodness of fit tests for the multiple logistic regression model , 1980 .

[15]  Johannes B. Reitsma,et al.  Selection of individuals for genetic testing for familial hypercholesterolaemia: development and external validation of a prediction model for the presence of a mutation causing familial hypercholesterolaemia , 2016, European heart journal.

[16]  J. Borén,et al.  Genetic diagnosis of familial hypercholesterolaemia by targeted next-generation sequencing , 2014, Journal of internal medicine.

[17]  Cynthia Rudin,et al.  Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead , 2018, Nature Machine Intelligence.

[18]  R. Trevethan,et al.  Sensitivity, Specificity, and Predictive Values: Foundations, Pliabilities, and Pitfalls in Research and Practice , 2017, Front. Public Health.

[19]  E. Fisher,et al.  Lipoprotein Metabolism, Dyslipidemia, and Nonalcoholic Fatty Liver Disease , 2013, Seminars in Liver Disease.

[20]  A. Tonkin Faculty Opinions recommendation of Familial hypercholesterolaemia is underdiagnosed and undertreated in the general population: guidance for clinicians to prevent coronary heart disease: consensus statement of the European Atherosclerosis Society. , 2014 .

[21]  Udo Hoffmann,et al.  Fatty liver is associated with dyslipidemia and dysglycemia independent of visceral fat: The Framingham heart study , 2010, Hepatology.