Supervised machine learning techniques for the classification of metabolic disorders in newborns

MOTIVATION During the Bavarian newborn screening programme all newborns have been tested for about 20 inherited metabolic disorders. Owing to the amount and complexity of the generated experimental data, machine learning techniques provide a promising approach to investigate novel patterns in high-dimensional metabolic data which form the source for constructing classification rules with high discriminatory power. RESULTS Six machine learning techniques have been investigated for their classification accuracy focusing on two metabolic disorders, phenylketo nuria (PKU) and medium-chain acyl-CoA dehydrogenase deficiency (MCADD). Logistic regression analysis led to superior classification rules (sensitivity >96.8%, specificity >99.98%) compared to all investigated algorithms. Including novel constellations of metabolites into the models, the positive predictive value could be strongly increased (PKU 71.9% versus 16.2%, MCADD 88.4% versus 54.6% compared to the established diagnostic markers). Our results clearly prove that the mined data confirm the known and indicate some novel metabolic patterns which may contribute to a better understanding of newborn metabolism.

[1]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[2]  Bernhard Liebl,et al.  Expanded newborn screening in Bavaria: tracking to achieve requested repeat testing. , 2002, Preventive medicine.

[3]  Pedro Mendes,et al.  Emerging bioinformatics for the metabolome , 2002, Briefings Bioinform..

[4]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[5]  Christian Baumgartner,et al.  Classification on high dimensional metabolic data: Phenylketonuria as an example , 2004 .

[6]  Y. T. Chen,et al.  Medium-chain acyl-CoA dehydrogenase (MCAD) deficiency: diagnosis by acylcarnitine analysis in blood. , 1993, American journal of human genetics.

[7]  Steven L. Salzberg On Comparing Classifiers: A Critique of Current Research and Methods , 1999 .

[8]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[9]  Lucila Ohno-Machado,et al.  A Comparison of Machine Learning Methods for the Diagnosis of Pigmented Skin Lesions , 2001, J. Biomed. Informatics.

[10]  G C Cunningham,et al.  Use of phenylalanine-to-tyrosine ratio determined by tandem mass spectrometry to improve newborn screening for phenylketonuria of early discharge specimens collected in the first 24 hours. , 1998, Clinical chemistry.

[11]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[12]  I. Matsumoto,et al.  Advances in chemical diagnosis and treatment of metabolic disorders , 1992 .

[13]  D. Maltby,et al.  Application of high resolution fast atom bombardment and constant B/E ratio linked scanning to the identification and analysis of acylcarnitines in metabolic disease. , 1984, Biomedical mass spectrometry.

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[15]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[16]  P. Guldberg,et al.  A European multicenter study of phenylalanine hydroxylase deficiency: classification of 105 mutations and a general system for genotype-based prediction of metabolic phenotype. , 1998, American journal of human genetics.

[17]  David M. Rocke,et al.  Discriminant models for high‐throughput proteomics mass spectrometer data , 2003, Proteomics.

[18]  Piero Rinaldo,et al.  Fatty acid oxidation disorders. , 2002, Annual review of physiology.

[19]  Igor Kononenko,et al.  On Biases in Estimating Multi-Valued Attributes , 1995, IJCAI.

[20]  Laura A Stokowski Tandem mass spectrometry in newborn screening. , 2003, Advances in neonatal care : official journal of the National Association of Neonatal Nurses.

[21]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[22]  Joel Charrow,et al.  Tandem mass spectrometry in newborn screening: American College of Medical Genetics/American Society of Human Genetics Test and Technology Transfer Committee Working Group , 2000, Genetics in Medicine.

[23]  Russ Wolfinger,et al.  Generalizable mass spectrometry mining used to identify disease state biomarkers from blood serum , 2003, Proteomics.

[24]  M. Bucknall,et al.  Diagnosis of Inborn Errors of Metabolism from Blood Spots by Acylcarnitines and Amino Acids Profiling Using Automated Electrospray Tandem Mass Spectrometry , 1995, Pediatric Research.

[25]  J. Rashbass Online Mendelian Inheritance in Man. , 1995, Trends in genetics : TIG.

[26]  D. Millington,et al.  Rapid diagnosis of phenylketonuria by quantitative analysis for phenylalanine and tyrosine in neonatal blood spots by tandem mass spectrometry. , 1993, Clinical chemistry.

[27]  Bernhard Liebl,et al.  Data required for the evaluation of newborn screening programmes , 2003, European Journal of Pediatrics.

[28]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[29]  D H Chace,et al.  Laboratory integration and utilization of tandem mass spectrometry in neonatal screening: a model for clinical mass spectrometry in the next millennium , 1999, Acta paediatrica (Oslo, Norway : 1992). Supplement.

[30]  Bernhard Liebl,et al.  Very high compliance in an expanded MS-MS-based newborn screening program despite written parental consent. , 2002, Preventive medicine.

[31]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[32]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.