Random forest classifier improving phenylketonuria screening performance in two Chinese populations

Phenylketonuria (PKU) is a genetic disorder with amino acid metabolic defect, which does great harms to the development of newborns and children. Early diagnosis and treatment can effectively prevent the disease progression. Here we developed a PKU screening model using random forest classifier (RFC) to improve PKU screening performance with excellent sensitivity, false positive rate (FPR) and positive predictive value (PPV) in all the validation dataset and two testing Chinese populations. RFC represented outstanding advantages comparing several different classification models based on machine learning and the traditional logistic regression model. RFC is promising to be applied to neonatal PKU screening.

[1]  C. Yang,et al.  [Screening results and genetic analysis of neonatal tetrahydrobiopterin deficiency in Hainan Province from 2007 to 2019]. , 2021, Zhonghua yi xue za zhi.

[2]  N. Meropol,et al.  Progress in the Application of Machine Learning Algorithms to Cancer Research and Care. , 2021, JAMA network open.

[3]  Y. Wang,et al.  [Establishment of an auxiliary diagnosis system of newborn screening for inherited metabolic diseases based on artificial intelligence technology and a clinical trial]. , 2021, Zhonghua er ke za zhi = Chinese journal of pediatrics.

[4]  Soojin Park,et al.  Machine Learning to Predict Delayed Cerebral Ischemia and Outcomes in Subarachnoid Hemorrhage , 2020, Neurology.

[5]  Yangmin Wang,et al.  Improving the Diagnosis of Phenylketonuria by Using a Machine Learning–Based Screening Model of Neonatal MRM Data , 2020, Frontiers in Molecular Biosciences.

[6]  A. Gonzalez-Perez,et al.  In silico saturation mutagenesis of cancer genes , 2020, Nature.

[7]  Hongyu Zhao,et al.  Reducing False-Positive Results in Newborn Screening Using Machine Learning , 2020, International journal of neonatal screening.

[8]  Xu Ma,et al.  Mutation spectrum of PAH gene in phenylketonuria patients in Northwest China: identification of twenty novel variants , 2019, Metabolic Brain Disease.

[9]  K. Borgwardt,et al.  Machine Learning in Medicine , 2015, Mach. Learn. under Resour. Constraints Vol. 3.

[10]  N. Blau,et al.  Molecular genetics and diagnosis of phenylketonuria: state of the art , 2014, Expert review of molecular diagnostics.

[11]  Sheau-Ling Hsieh,et al.  Web-Based Newborn Screening System for Metabolic Diseases: Machine Learning Versus Clinicians , 2013, Journal of medical Internet research.

[12]  M. Leichsenring,et al.  Efficacy and outcome of expanded newborn screening for metabolic diseases - Report of 10 years from South-West Germany * , 2011, Orphanet journal of rare diseases.

[13]  Michael S. Watson,et al.  Newborn Screening: Toward a Uniform Screening Panel and System—Executive Summary , 2006, Pediatrics.

[14]  Christian Böhm,et al.  Supervised machine learning techniques for the classification of metabolic disorders in newborns , 2004, Bioinform..

[15]  L. Breiman Random Forests , 2001, Encyclopedia of Machine Learning and Data Mining.

[16]  R. Guthrie,et al.  A SIMPLE PHENYLALANINE METHOD FOR DETECTING PHENYLKETONURIA IN LARGE POPULATIONS OF NEWBORN INFANTS. , 1963, Pediatrics.

[17]  Rohit Kumar,et al.  Machine Learning—Basics , 2017 .

[18]  Chao Chen,et al.  Using Random Forest to Learn Imbalanced Data , 2004 .

[19]  A. Pontecorvi,et al.  Simultaneous high-performance liquid chromatographic determination of amino acids in a dried blood spot as a neonatal screening test. , 1990, Journal of chromatography.