EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE

Many real-life decision-making situations allow further relevant information to be acquired at a specific cost, for example, in assessing the health status of a patient we may decide to take additional measurements such as diagnostic tests or imaging scans before making a final assessment. Acquiring more relevant information enables better decision making, but may be costly. How can we trade off the desire to make good decisions by acquiring further information with the cost of performing that acquisition? To this end, we propose a principled framework, named EDDI (Efficient Dynamic Discovery of high-value Information), based on the theory of Bayesian experimental design. In EDDI, we propose a novel partial variational autoencoder (Partial VAE) to predict missing data entries problematically given any subset of the observed ones, and combine it with an acquisition function that maximizes expected information gain on a set of target variables. We show cost reduction at the same decision quality and improved decision quality at the same cost in multiple machine learning benchmarks and two real-world health-care applications.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  J. Bernardo Expected Information as Expected Utility , 1979 .

[3]  David M. Blei,et al.  Content-based recommendations with Poisson factorization , 2014, NIPS.

[4]  Dustin Tran,et al.  Hierarchical Variational Models , 2015, ICML.

[5]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[6]  Zoubin Ghahramani,et al.  Bayesian Active Learning for Classification and Preference Learning , 2011, ArXiv.

[7]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[8]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Thore Graepel,et al.  Matchbox: Large Scale Bayesian Recommendations , 2009 .

[10]  Andrea Montanari,et al.  Matrix Completion from Noisy Entries , 2009, J. Mach. Learn. Res..

[11]  Scott Sanner,et al.  Conditional Inference in Pre-trained Variational Autoencoders via Cross-coding , 2018, ArXiv.

[12]  David Cohn,et al.  Active Learning , 2010, Encyclopedia of Machine Learning.

[13]  Yee Whye Teh,et al.  Conditional Neural Processes , 2018, ICML.

[14]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[15]  Peter Fritz,et al.  Bmc Medical Informatics and Decision Making Underutilization of Information and Knowledge in Everyday Medical Practice: Evaluation of a Computer-based Solution , 2022 .

[16]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[17]  Eunho Yang,et al.  Joint Active Feature Acquisition and Classification with Variable-Size Set Encoding , 2018, NeurIPS.

[18]  Hedvig Kjellström,et al.  Advances in Variational Inference , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Mohamed Thahir,et al.  An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction , 2012, BMC Proceedings.

[20]  Kristin L. Sainani,et al.  Dealing with missing data , 2002 .

[21]  Gang Niu,et al.  Active Feature Acquisition with Supervised Matrix Completion , 2018, KDD.

[22]  Foster J. Provost,et al.  Active Feature-Value Acquisition , 2009, Manag. Sci..

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  Jeffrey S. Rosenschein,et al.  Knowing What to Ask: A Bayesian Active Learning Approach to the Surveying Problem , 2017, AAAI.

[25]  V. Preedy,et al.  National Health and Nutrition Examination Survey , 2010 .

[26]  Inderjit S. Dhillon,et al.  Guaranteed Rank Minimization via Singular Value Projection , 2009, NIPS.

[27]  Inderjit S. Dhillon,et al.  Temporal Regularized Matrix Factorization for High-dimensional Time Series Prediction , 2016, NIPS.

[28]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[29]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[30]  Foster J. Provost,et al.  Active feature-value acquisition for classifier induction , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[31]  Aram Galstyan,et al.  Multitask learning and benchmarking with clinical time series data , 2017, Scientific Data.

[32]  Zhiqiang Zheng,et al.  On active learning for data acquisition , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[33]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[34]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[35]  Hedvig Kjellström,et al.  Simultaneous measurement imputation and outcome prediction for Achilles tendon rupture rehabilitation , 2018, AIH@IJCAI.

[36]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[39]  Pablo M. Olmos,et al.  Handling Incomplete Heterogeneous Data using VAEs , 2018, Pattern Recognit..

[40]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[41]  Ruslan Salakhutdinov,et al.  Bayesian probabilistic matrix factorization using Markov chain Monte Carlo , 2008, ICML '08.