Improving genomics-based predictions for precision medicine through active elicitation of expert knowledge

Motivation Precision medicine requires the ability to predict the efficacies of different treatments for a given individual using high‐dimensional genomic measurements. However, identifying predictive features remains a challenge when the sample size is small. Incorporating expert knowledge offers a promising approach to improve predictions, but collecting such knowledge is laborious if the number of candidate features is very large. Results We introduce a probabilistic framework to incorporate expert feedback about the impact of genomic measurements on the outcome of interest and present a novel approach to collect the feedback efficiently, based on Bayesian experimental design. The new approach outperformed other recent alternatives in two medical applications: prediction of metabolic traits and prediction of sensitivity of cancer cells to different drugs, both using genomic features as predictors. Furthermore, the intelligent approach to collect feedback reduced the workload of the expert to approximately 11%, compared to a baseline approach. Availability and implementation Source code implementing the introduced computational methods is freely available at https://github.com/AaltoPML/knowledge‐elicitation‐for‐precision‐medicine.

[1]  Jouko Lampinen,et al.  Bayesian Model Assessment and Comparison Using Cross-Validation Predictive Densities , 2002, Neural Computation.

[2]  Mingming Jia,et al.  COSMIC: exploring the world's knowledge of somatic mutations in human cancer , 2014, Nucleic Acids Res..

[3]  S. Ramaswamy,et al.  Systematic identification of genomic markers of drug sensitivity in cancer cells , 2012, Nature.

[4]  M. Pirinen,et al.  Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA , 2016, Nature Communications.

[5]  Fadlalla G. Elfadaly,et al.  Prior distribution elicitation for generalized linear and piecewise-linear models , 2013 .

[6]  James E. Helmreich Regression Modeling Strategies with Applications to Linear Models, Logistic and Ordinal Regression and Survival Analysis (2nd Edition) , 2016 .

[7]  Rodrigo Dienstmann,et al.  Stepwise Group Sparse Regression (SGSR): Gene-Set-Based Pharmacogenomic Predictive Models with Stepwise Selection of Functional Priors , 2014, Pacific Symposium on Biocomputing.

[8]  Artem Sokolov,et al.  Pathway-Based Genomics Prediction using Generalized Elastic Net , 2016, PLoS Comput. Biol..

[9]  Ranadip Pal,et al.  Algorithms for Drug Sensitivity Prediction , 2016, Algorithms.

[10]  Laura M. Heiser,et al.  A community effort to assess and improve drug sensitivity prediction algorithms , 2014, Nature Biotechnology.

[11]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[12]  Matti Pirinen,et al.  Assessing multivariate gene-metabolome associations with rare variants using Bayesian reduced rank regression , 2014, Bioinform..

[13]  Samuel Kaski,et al.  Interactive intent modeling , 2014, Commun. ACM.

[14]  Tom Minka,et al.  Expectation-Propogation for the Generative Aspect Model , 2002, UAI.

[15]  Marc G. Genton,et al.  Classes of Kernels for Machine Learning: A Statistics Perspective , 2002, J. Mach. Learn. Res..

[16]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[17]  Maria-Florina Balcan,et al.  Clustering with Interactive Feedback , 2008, ALT.

[18]  Andrés Cano,et al.  A Method for Integrating Expert Knowledge When Learning Bayesian Networks From Data , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Chao Han,et al.  Bayesian visual analytics: BaVA , 2015, Stat. Anal. Data Min..

[20]  Samuel Kaski,et al.  Regression with n→1 by Expert Knowledge Elicitation , 2016, 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA).

[21]  Justin Guinney,et al.  Systematic Assessment of Analytical Methods for Drug Sensitivity Prediction from Cancer Cell Line Data , 2013, Pacific Symposium on Biocomputing.

[22]  Raul Cano On The Bayesian Bootstrap , 1992 .

[23]  O. Lohi,et al.  Novel activating STAT5B mutations as putative drivers of T-cell acute lymphoblastic leukemia , 2014, Leukemia.

[24]  Krister Wennerberg,et al.  Quantitative scoring of differential drug sensitivity for individually optimized anticancer therapies , 2014, Scientific Reports.

[25]  Samuel Kaski,et al.  Interactive Prior Elicitation of Feature Similarities for Small Sample Size Prediction , 2016, UMAP.

[26]  Aki Vehtari,et al.  Gaussian processes with monotonicity information , 2010, AISTATS.

[27]  Nci Dream Community A community effort to assess and improve drug sensitivity prediction algorithms , 2014 .

[28]  Wei Chu,et al.  A contextual-bandit approach to personalized news article recommendation , 2010, WWW '10.

[29]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[30]  Guang Cheng,et al.  Active Clinical Trials for Personalized Medicine , 2014, Journal of the American Statistical Association.

[31]  Samuel Kaski,et al.  Knowledge elicitation via sequential probabilistic inference for high-dimensional prediction , 2016, Machine Learning.

[32]  Tero Aittokallio,et al.  Drug response prediction by inferring pathway-response associations with kernelized Bayesian matrix factorization , 2016, Bioinform..

[33]  Jeremy E. Oakley,et al.  Uncertain Judgements: Eliciting Experts' Probabilities , 2006 .

[34]  Samuel Kaski,et al.  Interactive Elicitation of Knowledge on Feature Relevance Improves Predictions in Small Data Sets , 2016, IUI.

[35]  Joelle Pineau,et al.  Active Learning for Developing Personalized Treatment , 2011, UAI.

[36]  Zhengdong Lu Semi-supervised Clustering with Pairwise Constraints: A Discriminative Approach , 2007, AISTATS.

[37]  Paul H. Garthwaite,et al.  Quantifying Expert Opinion in Linear Regression Problems , 1988 .

[38]  Daniel Hernández-Lobato,et al.  Expectation propagation in linear regression models with spike-and-slab priors , 2015, Machine Learning.

[39]  Florian Steinke,et al.  Bayesian Inference and Optimal Design in the Sparse Linear Model , 2007, AISTATS.

[40]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[41]  Hristo S. Paskov,et al.  Multitask learning improves prediction of cancer drug sensitivity , 2016, Scientific Reports.

[42]  Wayne S. Smith,et al.  Interactive Elicitation of Opinion for a Normal Linear Model , 1980 .

[43]  Katja Borodulin,et al.  Forty-year trends in cardiovascular risk factors in Finland. , 2015, European journal of public health.