Evaluating recommender systems for AI-driven biomedical informatics.

MOTIVATION Many researchers with domain expertise are unable to easily apply machine learning to their bioinformatics data due to a lack of machine learning and/or coding expertise. Methods that have been proposed thus far to automate machine learning mostly require programming experience as well as expert knowledge to tune and apply the algorithms correctly. Here, we study a method of automating biomedical data science using a web-based platform that uses AI to recommend model choices and conduct experiments. We have two goals in mind: first, to make it easy to construct sophisticated models of biomedical processes; and second, to provide a fully automated AI agent that can choose and conduct promising experiments for the user, based on the user's experiments as well as prior knowledge. To validate this framework, we experiment with hundreds of classification problems, comparing to state-of-the-art, automated approaches. Finally, we use this tool to develop predictive models of septic shock in critical care patients. RESULTS We find that matrix factorization-based recommendation systems outperform meta-learning methods for automating machine learning. This result mirrors the results of earlier recommender systems research in other domains. The proposed AI is competitive with state-of-the-art automated machine learning methods in terms of choosing optimal algorithm configurations for datasets. In our application to prediction of septic shock, the AI-driven analysis produces a competent machine learning model (AUROC 0.85 +/- 0.02) that performs on par with state-of-the-art deep learning results for this task, with much less computational effort. AVAILABILITY PennAI is available free of charge and open-source. It is distributed under the GNU public license (GPL) version 3. SUPPLEMENTARY INFORMATION Software and experiments are available from epistasislab.github.io/pennai.

[1]  Lars Kotthoff,et al.  Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA , 2017, J. Mach. Learn. Res..

[2]  Genevieve Gorrell,et al.  Generalized Hebbian Algorithm for Incremental Singular Value Decomposition in Natural Language Processing , 2006, EACL.

[3]  Michèle Sebag,et al.  Alors: An algorithm recommender system , 2017, Artif. Intell..

[4]  Sergio Escalera,et al.  A brief Review of the ChaLearn AutoML Challenge: Any-time Any-dataset Learning without Human Intervention , 2016, AutoML@ICML.

[5]  Randal S. Olson,et al.  Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science , 2016, GECCO.

[6]  Carlos Soares,et al.  Ranking Learning Algorithms: Using IBL and Meta-Learning on Accuracy and Time Results , 2003, Machine Learning.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Yehuda Koren,et al.  Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[9]  Dae Won Kim,et al.  OBOE: Collaborative Filtering for AutoML Model Selection , 2018, KDD.

[10]  F. Hutter,et al.  Practical Automated Machine Learning for the AutoML Challenge 2018 , 2018 .

[11]  Randal S. Olson,et al.  Data-driven advice for applying machine learning to bioinformatics problems , 2017, PSB.

[12]  Randal S. Olson,et al.  A System for Accessible Artificial Intelligence , 2017, GPTP.

[13]  Alexander Allen,et al.  Benchmarking Automatic Machine Learning Frameworks , 2018, ArXiv.

[14]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[15]  R. Bellomo,et al.  The Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). , 2016, JAMA.

[16]  Aaron Klein,et al.  Efficient and Robust Automated Machine Learning , 2015, NIPS.

[17]  Dean P. Foster,et al.  Clustering Methods for Collaborative Filtering , 1998, AAAI 1998.

[18]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[19]  Nicolas Hug,et al.  Surprise: A Python library for recommender systems , 2020, J. Open Source Softw..

[20]  Srujana Merugu,et al.  A scalable collaborative filtering framework based on co-clustering , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[21]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[22]  Melih Elibol,et al.  Probabilistic Matrix Factorization for Automated Machine Learning , 2017, NeurIPS.

[23]  Luca Pulina,et al.  Collaborative Expert Portfolio Management , 2010, AAAI.

[24]  Madeleine Udell,et al.  OBOE: Collaborative Filtering for AutoML Initialization. , 2018 .

[25]  Aram Galstyan,et al.  Multitask learning and benchmarking with clinical time series data , 2017, Scientific Data.

[26]  James Bennett,et al.  The Netflix Prize , 2007 .

[27]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[28]  Lidia Arroyo Prieto Acm , 2020, Encyclopedia of Cryptography and Security.

[29]  P. Pronovost,et al.  A targeted real-time early warning score (TREWScore) for septic shock , 2015, Science Translational Medicine.

[30]  Scott M. Williams,et al.  A balanced accuracy function for epistasis modeling in imbalanced datasets using multifactor dimensionality reduction , 2007, Genetic epidemiology.

[31]  Richard S. Zemel,et al.  Collaborative prediction and ranking with non-random missing data , 2009, RecSys '09.

[32]  Yu He,et al.  The YouTube video recommendation system , 2010, RecSys '10.

[33]  Randal S. Olson,et al.  PMLB: a large benchmark suite for machine learning evaluation and comparison , 2017, BioData Mining.

[34]  Domonkos Tikk,et al.  Recommending new movies: even a few ratings are more valuable than metadata , 2009, RecSys '09.

[35]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Metalearning and Recommender Systems: A literature review and empirical study on the algorithm selection problem for Collaborative Filtering , 2018, Inf. Sci..

[36]  Jordan M. Malof,et al.  Distributed solar photovoltaic array location and extent dataset for remote sensing object identification , 2016, Scientific Data.

[37]  Yehuda Koren,et al.  The BellKor solution to the Netflix Prize , 2007 .

[38]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[39]  Greg Linden,et al.  Two Decades of Recommender Systems at Amazon.com , 2017, IEEE Internet Computing.

[40]  Chris Eliasmith,et al.  Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-Learn , 2014, SciPy.