A web-based automated machine learning platform to analyze liquid biopsy data.

Liquid biopsy (LB) technologies continue to improve in sensitivity, specificity, and multiplexing and can measure an ever growing library of disease biomarkers. However, clinical interpretation of the increasingly large sets of data these technologies generate remains a challenge. Machine learning is a popular approach to discover and detect signatures of disease. However, limited machine learning expertise in the LB field has kept the discipline from fully leveraging these tools and risks improper analyses and irreproducible results. In this paper, we develop a web-based automated machine learning tool tailored specifically for LB, where machine learning models can be built without the user's input. We also incorporate a differential privacy algorithm, designed to limit the effects of overfitting that can arise from users iteratively developing a panel with feedback from our platform. We validate our approach by performing a meta-analysis on 11 published LB datasets, and found that we had similar or better performance compared to those reported in the literature. Moreover, we show that our platform's performance improved when incorporating information from prior LB datasets, suggesting that this approach can continue to improve with increased access to LB data. Finally, we show that by using our platform the results achieved in the literature can be matched using 40% of the number of subjects in the training set, potentially reducing study cost and time. This self-improving and overfitting-resistant automatic machine learning platform provides a new standard that can be used to validate machine learning works in the LB field.

[1]  Toshifumi Hibi,et al.  Novel, Objective, Multivariate Biomarkers Composed of Plasma Amino Acid Profiles for the Diagnosis and Assessment of Inflammatory Bowel Disease , 2012, PloS one.

[2]  Adam T Woolley,et al.  Applications of microfluidics and microchip electrophoresis for potential clinical biomarker analysis , 2015, Analytical and Bioanalytical Chemistry.

[3]  D. Issadore,et al.  Combining Machine Learning and Nanofluidic Technology To Diagnose Pancreatic Cancer Using Exosomes. , 2017, ACS nano.

[4]  J. Zieleński,et al.  Cystic fibrosis: genotypic and phenotypic variations. , 1995, Annual review of genetics.

[5]  Leif D. Nelson,et al.  False-Positive Psychology , 2011, Psychological science.

[6]  Po-Ling Loh,et al.  Machine learning to detect signatures of disease in liquid biopsies - a user's guide. , 2018, Lab on a chip.

[7]  E. Huang,et al.  Integrating Factor Analysis and a Transgenic Mouse Model to Reveal a Peripheral Blood Predictor of Breast Tumors , 2011, BMC Medical Genomics.

[8]  Prescott G Woodruff,et al.  Sarcoidosis blood transcriptome reflects lung inflammation and overlaps with tuberculosis. , 2011, American journal of respiratory and critical care medicine.

[9]  E. Giovannetti,et al.  Swarm Intelligence-Enhanced Detection of Non-Small-Cell Lung Cancer Using Tumor-Educated Platelets , 2017, Cancer cell.

[10]  Carter Bancroft,et al.  Utilization of Never-Medicated Bipolar Disorder Patients towards Development and Validation of a Peripheral Biomarker Profile , 2013, PloS one.

[11]  Nathan E. Lewis,et al.  Novel personalized pathway-based metabolomics models reveal key metabolic pathways for breast cancer diagnosis , 2016, Genome Medicine.

[12]  Edward T. Bullmore,et al.  Plasma Protein Biomarkers for Depression and Schizophrenia by Multi Analyte Profiling of Case-Control Collections , 2010, PloS one.

[13]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[14]  J. Stanslas,et al.  Blood and urine biomarkers in chronic kidney disease: An update. , 2019, Clinica chimica acta; international journal of clinical chemistry.

[15]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[16]  Quynh-Thu Le,et al.  Identification of osteopontin as a prognostic plasma marker for head and neck squamous cell carcinomas. , 2003, Clinical cancer research : an official journal of the American Association for Cancer Research.

[17]  Davide Chicco,et al.  Ten quick tips for machine learning in computational biology , 2017, BioData Mining.

[18]  Makoto Ueno,et al.  A Novel Multivariate Index for Pancreatic Cancer Detection Based On the Plasma Free Amino Acid Profile , 2015, PloS one.

[19]  Jackson da Silva Gullo,et al.  Plasma levels of oxidative stress biomarkers and hospital mortality in severe head injury: a multivariate analysis. , 2012, Journal of critical care.

[20]  Pieter Wesseling,et al.  RNA-Seq of Tumor-Educated Platelets Enables Blood-Based Pan-Cancer, Multiclass, and Molecular Pathway Cancer Diagnostics , 2015, Cancer cell.

[21]  Swati Suryawanshi,et al.  Plasma MicroRNAs as Novel Biomarkers for Endometriosis and Endometriosis-Associated Ovarian Cancer , 2013, Clinical Cancer Research.

[22]  Colin B Begg,et al.  Variation of serum prostate-specific antigen levels: an evaluation of year-to-year fluctuations. , 2003, JAMA.

[23]  Toniann Pitassi,et al.  The reusable holdout: Preserving validity in adaptive data analysis , 2015, Science.

[24]  Adnan Majid,et al.  Tumor biomarker testing in non-small-cell lung cancer: A decade of change. , 2018, Lung cancer.

[25]  Mitchell P. Levesque,et al.  Melanoma Immunotherapy: Next-Generation Biomarkers , 2018, Front. Oncol..

[26]  F. Nicolantonio,et al.  Liquid biopsy: monitoring cancer-genetics in the blood , 2013, Nature Reviews Clinical Oncology.

[27]  Feng Xu,et al.  Biomarker detection for disease diagnosis using cost-effective microfluidic platforms. , 2015, The Analyst.

[28]  R. Sinha,et al.  Body Mass Index and Diabetes in Asia: A Cross-Sectional Pooled Analysis of 900,000 Individuals in the Asia Cohort Consortium , 2011, PloS one.

[29]  Heiko Hecht,et al.  Vection is the main contributor to motion sickness induced by visual yaw rotation: Implications for conflict and eye movement theories , 2017, PloS one.

[30]  Konrad P. Körding,et al.  Meaningless comparisons lead to false optimism in medical machine learning , 2017, PloS one.

[31]  S. M. Sumi,et al.  Phenotypic heterogeneity in familial alzheimer's disease: A study of 24 kindreds , 1989, Annals of neurology.

[32]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[33]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[34]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[35]  Lesley Uttley,et al.  The evidence base for circulating tumour DNA blood-based biomarkers for the early detection of cancer: a systematic mapping review , 2017, BMC Cancer.

[36]  Jina Ko,et al.  Multi-Dimensional Mapping of Brain-Derived Extracellular Vesicle MicroRNA Biomarker for Traumatic Brain Injury Diagnostics. , 2020, Journal of neurotrauma.

[37]  Andrew Gelman,et al.  Ethics and Statistics: The AAA Tranche of Subprime Science , 2014 .

[38]  Susan Galandiuk,et al.  Blood-based microRNAs as biomarkers for the diagnosis of colorectal cancer: a systematic review and meta-analysis , 2017, British Journal of Cancer.

[39]  Arthur W. Toga,et al.  CSF biomarkers associated with disease heterogeneity in early Parkinson’s disease: the Parkinson’s Progression Markers Initiative study , 2016, Acta Neuropathologica.

[40]  A. Mobasheri,et al.  Application of machine learning to proteomics data: classification and biomarker identification in postgenomics biology. , 2013, Omics : a journal of integrative biology.

[41]  Patrick Maisonneuve,et al.  A serum circulating miRNA diagnostic test to identify asymptomatic high-risk individuals with early stage lung cancer , 2011, EMBO molecular medicine.