A Machine Learning Approach for the Automated Interpretation of Plasma Amino Acid Profiles.

BACKGROUND Plasma amino acid (PAA) profiles are used in routine clinical practice for the diagnosis and monitoring of inherited disorders of amino acid metabolism, organic acidemias, and urea cycle defects. Interpretation of PAA profiles is complex and requires substantial training and expertise to perform. Given previous demonstrations of the ability of machine learning (ML) algorithms to interpret complex clinical biochemistry data, we sought to determine if ML-derived classifiers could interpret PAA profiles with high predictive performance. METHODS We collected PAA profiling data routinely performed within a clinical biochemistry laboratory (2084 profiles) and developed decision support classifiers with several ML algorithms. We tested the generalization performance of each classifier using a nested cross-validation (CV) procedure and examined the effect of various subsampling, feature selection, and ensemble learning strategies. RESULTS The classifiers demonstrated excellent predictive performance, with the 3 ML algorithms tested producing comparable results. The best-performing ensemble binary classifier achieved a mean precision-recall (PR) AUC of 0.957 (95% CI 0.952, 0.962) and the best-performing ensemble multiclass classifier achieved a mean F4 score of 0.788 (0.773, 0.803). CONCLUSIONS This work builds upon previous demonstrations of the utility of ML-derived decision support tools in clinical biochemistry laboratories. Our findings suggest that, pending additional validation studies, such tools could potentially be used in routine clinical practice to streamline and aid the interpretation of PAA profiles. This would be particularly useful in laboratories with limited resources and large workloads. We provide the necessary code for other laboratories to develop their own decision support tools.

[1]  Jason M Baron,et al.  Using Machine Learning-Based Multianalyte Delta Checks to Detect Wrong Blood in Tube Errors , 2018, American journal of clinical pathology.

[2]  S. Moore,et al.  Chromatography of amino acids on sulfonated polystyrene resins. , 1951, The Journal of biological chemistry.

[3]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[4]  Gill Rumsby,et al.  Using Machine Learning to Aid the Interpretation of Urine Steroid Profiles. , 2018, Clinical chemistry.

[5]  Graham J. Williams,et al.  wsrf: An R Package for Classification with Scalable Weighted Subspace Random Forests , 2017 .

[6]  Witold R. Rudnicki,et al.  Feature Selection with the Boruta Package , 2010 .

[7]  Vagelis Papakonstantinou,et al.  Enhanced interpretation of newborn screening results without analyte cutoff values , 2012, Genetics in Medicine.

[8]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[9]  Michael Biehl,et al.  Urine Steroid Metabolomics as a Biomarker Tool for Detecting Malignancy in Adrenal Tumors , 2011, The Journal of clinical endocrinology and metabolism.

[10]  Peter Szolovits,et al.  Using Machine Learning to Predict Laboratory Test Results. , 2016, American journal of clinical pathology.

[11]  Mark Culp,et al.  ada: An R Package for Stochastic Boosting , 2006 .

[12]  A. Briddon Decision Support Techniques for the Interpretation of Quantitative Amino Acid Data , 1996, Annals of clinical biochemistry.

[13]  V. Oberholzer,et al.  An alternative way of presenting quantitative amino acid results. , 1987, Clinica chimica acta; international journal of clinical chemistry.