AutoPrognosis: Automated Clinical Prognostic Modeling via Bayesian Optimization with Structured Kernel Learning

Clinical prognostic models derived from largescale healthcare data can inform critical diagnostic and therapeutic decisions. To enable off-theshelf usage of machine learning (ML) in prognostic research, we developed AUTOPROGNOSIS: a system for automating the design of predictive modeling pipelines tailored for clinical prognosis. AUTOPROGNOSIS optimizes ensembles of pipeline configurations efficiently using a novel batched Bayesian optimization (BO) algorithm that learns a low-dimensional decomposition of the pipelines high-dimensional hyperparameter space in concurrence with the BO procedure. This is achieved by modeling the pipelines performances as a black-box function with a Gaussian process prior, and modeling the similarities between the pipelines baseline algorithms via a sparse additive kernel with a Dirichlet prior. Meta-learning is used to warmstart BO with external data from similar patient cohorts by calibrating the priors using an algorithm that mimics the empirical Bayes method. The system automatically explains its predictions by presenting the clinicians with logical association rules that link patients features to predicted risk strata. We demonstrate the utility of AUTOPROGNOSIS using 10 major patient cohorts representing various aspects of cardiovascular patient care.

[1]  Mihaela van der Schaar,et al.  Personalized Donor-Recipient Matching for Organ Transplantation , 2016, AAAI.

[2]  Matthias W. Seeger,et al.  Bayesian Optimization with Tree-structured Dependencies , 2017, ICML.

[3]  Andreas Krause,et al.  Advances in Neural Information Processing Systems (NIPS) , 2014 .

[4]  Lars Kotthoff,et al.  Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA , 2017, J. Mach. Learn. Res..

[5]  Randal S. Olson,et al.  TPOT: A Tree-based Pipeline Optimization Tool for Automating Machine Learning , 2016, AutoML@ICML.

[6]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[7]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[8]  Martin J. Wainwright,et al.  Lower bounds on minimax rates for nonparametric regression with additive sparsity and smoothness , 2009, NIPS.

[9]  Aleksandar Nikolov Randomized Rounding for the Largest Simplex Problem , 2015, STOC.

[10]  Mihaela van der Schaar,et al.  Learning from Clinical Judgments: Semi-Markov-Modulated Marked Hawkes Processes for Risk Prognosis , 2017, ICML.

[11]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[12]  Jasper Snoek,et al.  Practical Bayesian Optimization of Machine Learning Algorithms , 2012, NIPS.

[13]  Iain B Squire,et al.  Heart failure in younger patients: the Meta-analysis Global Group in Chronic Heart Failure (MAGGIC). , 2014, European heart journal.

[14]  D. Sculley,et al.  Google Vizier: A Service for Black-Box Optimization , 2017, KDD.

[15]  P. Elliott,et al.  UK Biobank: An Open Access Resource for Identifying the Causes of a Wide Range of Complex Diseases of Middle and Old Age , 2015, PLoS medicine.

[16]  Pushmeet Kohli,et al.  Batched Gaussian Process Bandit Optimization via Determinantal Point Processes , 2016, NIPS.

[17]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[18]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[19]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[20]  Robert Gray,et al.  A Proportional Hazards Model for the Subdistribution of a Competing Risk , 1999 .

[21]  S. Russell,et al.  Creation of a quantitative recipient risk index for mortality prediction after cardiac transplantation (IMPACT). , 2011, The Annals of thoracic surgery.

[22]  Yun Yang,et al.  Minimax-optimal nonparametric regression in high dimensions , 2014, 1401.7278.

[23]  Adam Krzyzak,et al.  A Distribution-Free Theory of Nonparametric Regression , 2002, Springer series in statistics.

[24]  James Bergstra,et al.  Implementations of Algorithms for Hyper-Parameter Optimization , 2011 .

[25]  Nando de Freitas,et al.  Bayesian Optimization in High Dimensions via Random Embeddings , 2013, IJCAI.

[26]  M. Blaha The Critical Importance of Risk Score Calibration: Time for Transformative Approach to Risk Score Validation? , 2016, Journal of the American College of Cardiology.

[27]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[28]  Michael A. Osborne,et al.  Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces , 2014, 1409.4011.

[29]  George Hripcsak,et al.  Parameterizing time in electronic health record studies , 2015, J. Am. Medical Informatics Assoc..

[30]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[31]  Tom Minka,et al.  A* Sampling , 2014, NIPS.

[32]  Kirthevasan Kandasamy,et al.  High Dimensional Bayesian Optimisation and Bandits via Additive Models , 2015, ICML.

[33]  F. Cabitza,et al.  Unintended Consequences of Machine Learning in Medicine , 2017, JAMA.

[34]  Gang Luo,et al.  Automatically explaining machine learning prediction results: a demonstration on type 2 diabetes risk prediction , 2016, Health Information Science and Systems.

[35]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[36]  D. Levy,et al.  Development of a risk score for atrial fibrillation (Framingham Heart Study): a community-based cohort study , 2009, The Lancet.

[37]  J. Kruschke Bayesian approaches to associative learning: From passive to active learning , 2008, Learning & behavior.

[38]  Peter Tarczy-Hornoch,et al.  Automating Construction of Machine Learning Models With Clinical Big Data: Proposal Rationale and Methods , 2017, JMIR research protocols.

[39]  Zi Wang,et al.  Batched High-dimensional Bayesian Optimization via Structural Kernel Learning , 2017, ICML.