Modelling of classification rules on metabolic patterns including machine learning and expert knowledge

Machine learning has a great potential to mine potential markers from high-dimensional metabolic data without any a priori knowledge. Exemplarily, we investigated metabolic patterns of three severe metabolic disorders, PAHD, MCADD, and 3-MCCD, on which we constructed classification models for disease screening and diagnosis using a decision tree paradigm and logistic regression analysis (LRA). For the LRA model-building process we assessed the relevance of established diagnostic flags, which have been developed from the biochemical knowledge of newborn metabolism, and compared the models' error rates with those of the decision tree classifier. Both approaches yielded comparable classification accuracy in terms of sensitivity (>95.2%), while the LRA models built on flags showed significantly enhanced specificity. The number of false positive cases did not exceed 0.001%.

[1]  Bernhard Liebl,et al.  Expanded newborn screening in Bavaria: tracking to achieve requested repeat testing. , 2002, Preventive medicine.

[2]  Christian Baumgartner,et al.  Classification on high dimensional metabolic data: Phenylketonuria as an example , 2004 .

[3]  Pedro Mendes,et al.  Emerging bioinformatics for the metabolome , 2002, Briefings Bioinform..

[4]  C. Scriver,et al.  The Metabolic and Molecular Bases of Inherited Disease, 8th Edition 2001 , 2001, Journal of Inherited Metabolic Disease.

[5]  Xiwu Lin,et al.  Megavariate data analysis of mass spectrometric proteomics data using latent variable projection method , 2003, Proteomics.

[6]  I. Matsumoto,et al.  Advances in chemical diagnosis and treatment of metabolic disorders , 1992 .

[7]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[8]  Antonio Rodríguez-Fernández,et al.  Prenatal and post-natal imaging of an hepatic mesenchymal hamartoma , 2002, European Journal of Pediatrics.

[9]  S. Amladi,et al.  Online Mendelian Inheritance in Man 'OMIM'. , 2003, Indian journal of dermatology, venereology and leprology.

[10]  D H Chace,et al.  Laboratory integration and utilization of tandem mass spectrometry in neonatal screening: a model for clinical mass spectrometry in the next millennium , 1999, Acta paediatrica (Oslo, Norway : 1992). Supplement.

[11]  Russ Wolfinger,et al.  Generalizable mass spectrometry mining used to identify disease state biomarkers from blood serum , 2003, Proteomics.

[12]  M. Bucknall,et al.  Diagnosis of Inborn Errors of Metabolism from Blood Spots by Acylcarnitines and Amino Acids Profiling Using Automated Electrospray Tandem Mass Spectrometry , 1995, Pediatric Research.

[13]  D. Millington,et al.  Rapid diagnosis of phenylketonuria by quantitative analysis for phenylalanine and tyrosine in neonatal blood spots by tandem mass spectrometry. , 1993, Clinical chemistry.

[14]  Bernhard Liebl,et al.  Data required for the evaluation of newborn screening programmes , 2003, European Journal of Pediatrics.

[15]  G C Cunningham,et al.  Use of phenylalanine-to-tyrosine ratio determined by tandem mass spectrometry to improve newborn screening for phenylketonuria of early discharge specimens collected in the first 24 hours. , 1998, Clinical chemistry.

[16]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[17]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[18]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[19]  Y. T. Chen,et al.  Medium-chain acyl-CoA dehydrogenase (MCAD) deficiency: diagnosis by acylcarnitine analysis in blood. , 1993, American journal of human genetics.

[20]  David M. Rocke,et al.  Discriminant models for high‐throughput proteomics mass spectrometer data , 2003, Proteomics.

[21]  Christian Böhm,et al.  Supervised machine learning techniques for the classification of metabolic disorders in newborns , 2004, Bioinform..

[22]  Joel Charrow,et al.  Tandem mass spectrometry in newborn screening: American College of Medical Genetics/American Society of Human Genetics Test and Technology Transfer Committee Working Group , 2000, Genetics in Medicine.

[23]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[24]  C. Heizmann,et al.  Suspected pterin-4a-carbinolamine dehydratase deficiency: Hyperphenylalaninaemia due to inhibition of phenylalanine hydroxylase by tetrahydro-7-biopterin , 1992, Journal of Inherited Metabolic Disease.

[25]  Laura A Stokowski Tandem mass spectrometry in newborn screening. , 2003, Advances in neonatal care : official journal of the National Association of Neonatal Nurses.

[26]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[27]  Andreas D. Baxevanis,et al.  Searching Online Mendelian Inheritance in Man (OMIM) for Information for Genetic Loci Involved in Human Disease , 2002, Current protocols in human genetics.

[28]  T. Suormala,et al.  Isolated biotin-resistant deficiency of 3-methylcrotonyl-CoA carboxylase presenting as a clinically severe form in a newborn with fatal outcome , 1992, Journal of Inherited Metabolic Disease.

[29]  L. Ohno-Machado Journal of Biomedical Informatics , 2001 .