Classification on high dimensional metabolic data: Phenylketonuria as an example

Tandem mass spectrometry is a promising new screening technology which permits screening within one analytical run not only for phenylketonuria (PKU) but also for a wide range of other metabolic disorders in newborns. We investigated two symbolic supervised machine learning techniques logistic regression analysis (LRA) and decision trees (DT), where the knowledge is represented in an explicit way to find classification rules for the presence of PKU. Our experiments were performed on pre-classified newborn screening data including a metabolite spectrum of 14 amino acids. LRA and DT classifiers showed high classification performance with a sensitivity of ≥ 97.7% and a specificity of ≥ 99.8%. In addition to the established diagnostic metabolites of phenylalanine and tyrosine, we also included alternative constellations of metabolites in our models showing comparable results in predictive power. The presented machine learning techniques are appropriate to investigate metabolic patterns in newborn screening data for constructing classification models for PKU.