An Ensemble Learning Approach for Enhanced Classification of Patients With Hepatitis and Cirrhosis

Hepatitis C is an infectious disease that affects more than 70 million people worldwide, even killing 400 thousand of them annually. To better understand this disease and its prognosis, medical doctors can take advantage of the electronic health records (EHRs) of patients, which contain data that computer-based approaches built on statistics and computational intelligence can process to unveil new discoveries and trends otherwise unnoticeable by physicians. In this study, we analyze EHRs of 540 healthy controls and 75 patients diagnosed with hepatitis C, and use machine learning classifiers to predict their diagnosis. We employ the top classifier (Random Forests) to detect the most diagnostic variables for hepatitis C, that result being aspartate aminotransferase (AST) and alanine aminotransferase (ALT). These two enzyme levels are also employed by physicians in the AST/ALT ratio, a traditional measure commonly employed in gastroenterology and hepatology. We apply the same approach to a validation dataset of 123 patients with hepatitis C and cirrhosis, and the same two variables arose as most relevant. We therefore compared our approach with the AST/ALT ratio, and noticed that our two-features ensemble learning model outperforms the traditional AST/ALT ratio on both datasets. Our results confirm the usefulness of ensemble machine learning for hepatitis C and cirrhosis diagnosis prediction. Moreover, our discoveries can have an impact on clinical practice, helping physicians predict diagnoses of patients at risk of hepatitis C and cirrhosis more precisely.

[1]  Terry L King A Guide to Chi-Squared Testing , 1997 .

[2]  Ju‐Seop Kang,et al.  Noninvasive Diagnostic and Prognostic Assessment Tools for Liver Fibrosis and Cirrhosis in Patients with Chronic Liver Disease , 2017 .

[3]  Cesare Furlanello,et al.  Deep representation learning of electronic health records to unlock patient stratification at scale , 2020, npj Digital Medicine.

[4]  P. Calès,et al.  AST/ALT ratio is not an index of liver fibrosis in chronic hepatitis C when aminotransferase activities are determinate according to the international recommendations. , 2013, Clinics and research in hepatology and gastroenterology.

[5]  Cheng Li,et al.  Progress in non-invasive detection of liver fibrosis , 2018, Cancer biology & medicine.

[6]  Yanjun Qi Random Forest for Bioinformatics , 2012 .

[7]  J. Ellsmere,et al.  A New Recalibrated Four-Category Child–Pugh Score Performs Better than the Original Child–Pugh and MELD Scores in Predicting In-Hospital Mortality in Decompensated Alcoholic Cirrhotic Patients with Acute Variceal Bleeding: a Real-World Cohort Analysis , 2019, World Journal of Surgery.

[8]  K. Land,et al.  An Empirical Evaluation of the Predictive Mean Matching Method for Imputing Missing Values , 1997 .

[9]  X. Qi,et al.  Child–Pugh Versus MELD Score for the Assessment of Prognosis in Liver Cirrhosis , 2016, Medicine.

[10]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[11]  Jan C. Bioch,et al.  Decision trees for ordinal classification , 2000, Intell. Data Anal..

[12]  Y. Hoshida,et al.  Prognostic gene signature profiles of hepatitis C-related early-stage liver cirrhosis , 2014, Genomics data.

[13]  Somaya Hashem,et al.  Machine Learning Prediction Models for Diagnosing Hepatocellular Carcinoma with HCV-related Chronic Liver Disease , 2020, Comput. Methods Programs Biomed..

[14]  P. Marcellin,et al.  Predictive value of ALT levels for histologic findings in chronic hepatitis C: A European collaborative study , 2002, Hepatology.

[15]  R. Levine,et al.  AST/ALT Ratio ≥1 Is Not Diagnostic of Cirrhosis in Patients with Chronic Hepatitis C , 1998, Digestive Diseases and Sciences.

[16]  Stanley M. Cohen,et al.  ACG Clinical Guideline: Evaluation of Abnormal Liver Chemistries , 2017, The American Journal of Gastroenterology.

[17]  Anna Goldenberg,et al.  Applying Machine Learning in Liver Disease and Transplantation: A Comprehensive Review , 2020, Hepatology.

[18]  Hanry Yu,et al.  Deep learning enables automated scoring of liver fibrosis stages , 2018, Scientific Reports.

[19]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[20]  Marco Masseroli,et al.  A discrete optimization approach for SVD best truncation choice based on ROC curves , 2013, 13th IEEE International Conference on BioInformatics and BioEngineering.

[21]  R. Ray,et al.  Mechanisms Underlying Hepatitis C Virus-Associated Hepatic Fibrosis , 2019, Cells.

[22]  Vili Podgorelec,et al.  Decision Trees: An Overview and Their Use in Medicine , 2002, Journal of Medical Systems.

[23]  Philip Hall,et al.  What is the Real Function of the Liver ‘Function’ Tests? , 2012, The Ulster medical journal.

[24]  Davide Chicco,et al.  Ten quick tips for machine learning in computational biology , 2017, BioData Mining.

[25]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[26]  M. Victoria-Feser,et al.  A Robust Coefficient of Determination for Regression , 2010 .

[27]  Asociacion Latinoamericana para el Estudio del Higado EASL-ALEH Clinical Practice Guidelines: Non-invasive tests for evaluation of liver disease severity and prognosis. , 2015, Journal of hepatology.

[28]  R. Hamatake,et al.  Gene Expression Profiling Indicates the Roles of Host Oxidative Stress, Apoptosis, Lipid Metabolism, and Intracellular Transport Genes in the Replication of Hepatitis C Virus , 2010, Journal of Virology.

[29]  M. M. Hussain,et al.  Correlation of serum alanine aminotransferase and aspartate aminotransferase levels to liver histology in chronic hepatitis C. , 2010, Journal of the College of Physicians and Surgeons--Pakistan : JCPSP.

[30]  E. Ulukaya,et al.  Clinical significance of activity of ALT enzyme in patients with hepatitis C virus. , 2007, World journal of gastroenterology.

[31]  Kenney Ng,et al.  The MELD-Plus: A generalizable prediction risk score in cirrhosis , 2017, PloS one.

[32]  Michael B. Miller Linear Regression Analysis , 2013 .

[33]  J. Lott,et al.  Diagnosis and monitoring of hepatic injury. I. Performance characteristics of laboratory tests. , 2000, Clinical chemistry.

[34]  G. Singh,et al.  Evaluation of De Ritis ratio in liver-associated diseases - , 2016 .

[35]  J. Carpenter,et al.  Practice of Epidemiology Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study , 2014 .

[36]  Somaya Hashem,et al.  Comparison of Machine Learning Approaches for Prediction of Advanced Liver Fibrosis in Chronic Hepatitis C Patients , 2018, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[37]  O. Abe,et al.  Deep learning for staging liver fibrosis on CT: a pilot study , 2018, European Radiology.

[38]  Maqc Consortium The MicroArray Quality Control ( MAQC )-II study of common practices for the development and validation of microarray-based predictive models , 2012 .

[39]  Wei Pan,et al.  Linear regression and two-class classification with gene expression data , 2003, Bioinform..

[40]  Paul Giboney,et al.  Mildly elevated liver transaminase levels in the asymptomatic patient. , 2005, American family physician.

[41]  O. Cummings,et al.  Need for validation of clinical decision aids: use of the AST/ALT ratio in predicting cirrhosis in chronic hepatitis C , 2000, American Journal of Gastroenterology.

[42]  J. Balldin,et al.  High AST/ALT ratio may indicate advanced alcoholic liver disease rather than heavy drinking. , 2004, Alcohol and alcoholism.

[43]  F. De Ritis,et al.  An enzymic test for the diagnosis of viral hepatitis; the transaminase serum activities. , 1957, Clinica chimica acta; international journal of clinical chemistry.

[44]  Xuan Zhu,et al.  Assessing the Prognostic Performance of the Child-Pugh, Model for End-Stage Liver Disease, and Albumin-Bilirubin Scores in Patients with Decompensated Cirrhosis: A Large Asian Cohort from Gastroenterology Department , 2020, Disease markers.

[45]  D. J. Kim,et al.  Staging of liver fibrosis or cirrhosis: The role of hepatic venous pressure gradient measurement. , 2015, World journal of hepatology.

[46]  R. Kaushik,et al.  Child-Turcotte-Pugh Score, MELD Score and MELD-Na Score as Predictors of Short-Term Mortality among Patients with End-Stage Liver Disease in Northern India , 2019, Inflammatory Intestinal Diseases.

[47]  W. Renner,et al.  The AST/ALT (De Ritis) Ratio Predicts Survival in Patients with Oral and Oropharyngeal Cancer , 2020, Diagnostics.

[48]  K. Borgwardt,et al.  Machine Learning in Medicine , 2015, Mach. Learn. under Resour. Constraints Vol. 3.

[49]  A. Mangoni,et al.  The De Ritis ratio as prognostic biomarker of in‐hospital mortality in COVID‐19 patients , 2020, European journal of clinical investigation.

[50]  Mitchell R. McGill The past and present of serum aminotransferases and the future of liver injury biomarkers , 2016, EXCLI journal.

[51]  G. Ma,et al.  De-Ritis Ratio Is Associated with Mortality after Cardiac Arrest , 2020, Disease markers.

[52]  J. Dillon,et al.  Intelligent Liver Function Testing: Working Smarter to Improve Patient Outcomes in Liver Disease. , 2020, The journal of applied laboratory medicine.

[53]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.

[54]  Nigam H. Shah,et al.  Implications of non-stationarity on predictive modeling using EHRs , 2015, J. Biomed. Informatics.

[55]  Nicola Torelli,et al.  ROSE: a Package for Binary Imbalanced Learning , 2014, R J..

[56]  Thomas W. MacFarland,et al.  Introduction to Nonparametric Statistics for the Biological Sciences Using R , 2016 .

[57]  R. Testa,et al.  Validity and clinical utility of the aspartate aminotransferase-alanine aminotransferase ratio in assessing disease severity and prognosis in patients with hepatitis C virus-related chronic liver disease. , 2003, Archives of internal medicine.

[58]  R. Testa,et al.  Transportability and reproducibility of the AST/ALT ratio in chronic hepatitis C patients , 2001, American Journal of Gastroenterology.

[59]  Adriano Chiò,et al.  Exploiting mutual information for the imputation of static and dynamic mixed-type clinical data with an adaptive k-nearest neighbours approach , 2020, BMC Medical Informatics and Decision Making.

[60]  Anderson,et al.  An assessment of the clinical utility of serum ALT and AST in chronic hepatitis C. , 2000, Hepatology research : the official journal of the Japan Society of Hepatology.

[61]  D. Chicco,et al.  The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation , 2020, BMC Genomics.

[62]  H. Tillmann,et al.  Standard liver tests , 2016, Clinical liver disease.

[63]  M. Ngu,et al.  Aspartate aminotransferase : alanine aminotransferase ratio in chronic hepatitis C infection: Is it a useful predictor of cirrhosis? , 2000, Journal of gastroenterology and hepatology.

[64]  F. Wróblewski,et al.  The clinical significance of alterations in transaminase activities of serum and other body fluids. , 1958, Advances in clinical chemistry.

[65]  Uli K. Chettipally,et al.  Prediction of Sepsis in the Intensive Care Unit With Minimal Electronic Health Record Data: A Machine Learning Approach , 2016, JMIR medical informatics.

[66]  Mark Hudson,et al.  Guidelines on the management of abnormal liver blood tests , 2017, Gut.

[67]  Bonnie L. Westra,et al.  Interpretable Predictive Models for Knowledge Discovery from Home-Care Electronic Health Records , 2011 .

[68]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[69]  Vincenzo Savarino,et al.  Liver enzyme alteration: a guide for clinicians , 2005, Canadian Medical Association Journal.

[70]  Cesare Furlanello,et al.  The international MAQC Society launches to enhance reproducibility of high-throughput technologies , 2017, Nature Biotechnology.

[71]  W. Kim,et al.  The model for end‐stage liver disease (MELD) , 2007, Hepatology.

[72]  F. Gordon,et al.  AST/ALT Ratio Predicts Cirrhosis in Patients With Chronic Hepatitis C Virus Infection , 1998, American Journal of Gastroenterology.

[73]  Osamu Matsui,et al.  Hepatitis C Related Chronic Liver Cirrhosis: Feasibility of Texture Analysis of MR Images for Classification of Fibrosis Stage and Necroinflammatory Activity Grade , 2015, PloS one.

[74]  Amit Verma,et al.  A pre-neoplastic epigenetic field defect in HCV-infected liver at transcription factor binding sites and polycomb targets , 2016, Oncogene.

[75]  Frank Klawonn,et al.  Using machine learning techniques to generate laboratory diagnostic pathways—a case study , 2018, Journal of Laboratory and Precision Medicine.

[76]  Patrick E. McKnight,et al.  Mann‐Whitney U Test , 2010 .

[77]  N. Barakat,et al.  Prediction and Staging of Hepatic Fibrosis in Children with Hepatitis C Virus: A Machine Learning Approach , 2019, Healthcare informatics research.

[78]  Christopher D. Chambers,et al.  Redefine statistical significance , 2017, Nature Human Behaviour.

[79]  R. Pugh,et al.  Transection of the oesophagus for bleeding oesophageal varices , 1973, The British journal of surgery.

[80]  Korbinian Brand,et al.  The Enhanced Liver Fibrosis (ELF) score: normal values, influence factors and proposed cut-off values. , 2013, Journal of hepatology.

[81]  G. Su,et al.  Assessment of a Deep Learning Model to Predict Hepatocellular Carcinoma in Patients With Hepatitis C Cirrhosis , 2020, JAMA network open.

[82]  D. Valla,et al.  Assessment of the prognosis of cirrhosis: Child-Pugh versus MELD. , 2005, Journal of hepatology.

[83]  Davide Chicco,et al.  Computational prediction of diagnosis and feature selection on mesothelioma patient health records , 2019, PloS one.

[84]  Joseph K. Lim,et al.  Non-invasive Fibrosis Assessment of Patients with Hepatitis C: Application of Society Guidelines to Clinical Practice , 2019, Current Hepatology Reports.

[85]  L. Ho,et al.  Prognostic Value of Hepatorenal Function By Modified Model for End‐stage Liver Disease (MELD) Score in Patients Undergoing Tricuspid Annuloplasty , 2018, Journal of the American Heart Association.

[86]  Chew XinYing,et al.  Hepatitis C Virus (HCV) Prediction by Machine Learning Techniques , 2020 .

[87]  Giuseppe Jurman,et al.  Machine learning can predict survival of patients with heart failure from serum creatinine and ejection fraction alone , 2020, BMC Medical Informatics and Decision Making.

[88]  K. Sikaris,et al.  The de ritis ratio: the test of time. , 2013, The Clinical biochemist. Reviews.

[89]  Alexandros T. Tzallas,et al.  Quantification of Liver Fibrosis—A Comparative Study , 2020 .

[90]  Xianlin Han,et al.  Association of Altered Liver Enzymes With Alzheimer Disease Diagnosis, Cognition, Neuroimaging Measures, and Cerebrospinal Fluid Biomarkers , 2019, JAMA network open.

[91]  Y. Kayacan,et al.  De Ritis ratio and biochemical parameters in COVID-19 patients , 2020, Archives of physiology and biochemistry.

[92]  Thomas R Hustead,et al.  Mildly Elevated Liver Transaminase Levels: Causes and Evaluation. , 2017, American family physician.