Icebreaker: Element-wise Efficient Information Acquisition with a Bayesian Deep Latent Gaussian Model

In this paper, we address the ice-start problem, i.e., the challenge of deploying machine learning models when only a little or no training data is initially available, and acquiring each feature element of data is associated with costs. This setting is representative of the real-world machine learning applications. For instance, in the health care domain, obtaining every single measurement comes with a cost. We propose Icebreaker, a principled framework for elementwise training data acquisition. Icebreaker introduces a full Bayesian Deep Latent Gaussian Model (BELGAM) with a novel inference method, which combines recent advances in amortized inference and stochastic gradient MCMC to enable fast and accurate posterior inference. By utilizing BELGAM’s ability to fully quantify model uncertainty, we also propose two information acquisition functions for imputation and active prediction problems. We demonstrate that BELGAM performs significantly better than previous variational autoencoder (VAE) based models, when the data set size is small, using both machine learning benchmarks and real world recommender systems and health-care applications. Moreover, Icebreaker not only demonstrates improved performance compared to baselines, but it is also capable of achieving better test performance with less training data available.

[1]  Zhe Gan,et al.  Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization , 2015, AISTATS.

[2]  Yordan Zaykov,et al.  Interpretable Outcome Prediction with Sparse Bayesian Neural Networks in Intensive Care , 2019, ArXiv.

[3]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[4]  Foster J. Provost,et al.  An expected utility approach to active feature-value acquisition , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[5]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[6]  Eric Horvitz,et al.  Traffic Updates: Saying a Lot While Revealing a Little , 2019, AAAI.

[7]  Foster J. Provost,et al.  Active Feature-Value Acquisition , 2009, Manag. Sci..

[8]  J. Bernardo Expected Information as Expected Utility , 1979 .

[9]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[10]  Jiayu Zhou,et al.  Active Matrix Completion , 2013, 2013 IEEE 13th International Conference on Data Mining.

[11]  José Miguel Hernández-Lobato,et al.  Partial VAE for Hybrid Recommender System , 2018 .

[12]  Hanghang Tong,et al.  Ice-Breaking: Mitigating Cold-Start Recommendation Problem by Rating Comparison , 2015, IJCAI.

[13]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[14]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[15]  Lawrence Carin,et al.  Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks , 2015, AAAI.

[16]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[17]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Thore Graepel,et al.  Matchbox: Large Scale Bayesian Recommendations , 2009 .

[19]  Zoubin Ghahramani,et al.  Bayesian Active Learning for Classification and Preference Learning , 2011, ArXiv.

[20]  Gang Niu,et al.  Active Feature Acquisition with Supervised Matrix Completion , 2018, KDD.

[21]  Dharmveer Singh Rajpoot,et al.  Resolving Cold Start problem in recommendation system using demographic approach , 2016, 2016 International Conference on Signal Processing and Communication (ICSC).

[22]  José Bento,et al.  Generative Adversarial Active Learning , 2017, ArXiv.

[23]  Eunho Yang,et al.  Joint Active Feature Acquisition and Classification with Variable-Size Set Encoding , 2018, NeurIPS.

[24]  Hedvig Kjellström,et al.  Advances in Variational Inference , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Mark Crovella,et al.  Matrix Completion with Queries , 2015, KDD.

[26]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[27]  Andreas Krause,et al.  A Utility-Theoretic Approach to Privacy in Online Services , 2010, J. Artif. Intell. Res..

[28]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[29]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[30]  Jeffrey S. Rosenschein,et al.  Knowing What to Ask: A Bayesian Active Learning Approach to the Surveying Problem , 2017, AAAI.

[31]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .

[32]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[33]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[34]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[35]  Cheng Zhang,et al.  ODIN: Optimal Discovery of High-value INformation Using Model-based Deep Reinforcement Learning , 2019 .

[36]  Sebastian Nowozin,et al.  EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE , 2018, ICML.

[37]  Tomás Pevný,et al.  Classification with Costly Features using Deep Reinforcement Learning , 2019, AAAI.

[38]  Juan José Murillo-Fuentes,et al.  Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo , 2018, NeurIPS.

[39]  Ran El-Yaniv,et al.  Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..

[40]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[41]  Pablo M. Olmos,et al.  Handling Incomplete Heterogeneous Data using VAEs , 2018, Pattern Recognit..

[42]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[43]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[44]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[45]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[46]  Barnabás Póczos,et al.  Active learning and search on low-rank matrices , 2013, KDD.

[47]  Yingzhen Li,et al.  Approximate inference: new visions , 2018 .

[48]  Aram Galstyan,et al.  Multitask learning and benchmarking with clinical time series data , 2017, Scientific Data.

[49]  Zoubin Ghahramani,et al.  Cold-start Active Learning with Robust Ordinal Matrix Factorization , 2014, ICML.

[50]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[51]  Mohamed Thahir,et al.  An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction , 2012, BMC Proceedings.

[52]  Foster J. Provost,et al.  Active feature-value acquisition for classifier induction , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[53]  Juan Lu,et al.  Predicting Outcome after Traumatic Brain Injury: Development and International Validation of Prognostic Scores Based on Admission Characteristics , 2008, PLoS medicine.