Icebreaker: Element-wise Active Information Acquisition with Bayesian Deep Latent Gaussian Model

In this paper we introduce the ice-start problem, i.e., the challenge of deploying machine learning models when only little or no training data is initially available, and acquiring each feature element of data is associated with costs. This setting is representative for the real-world machine learning applications. For instance, in the health-care domain, when training an AI system for predicting patient metrics from lab tests, obtaining every single measurement comes with a high cost. Active learning, where only the label is associated with a cost does not apply to such problem, because performing all possible lab tests to acquire a new training datum would be costly, as well as unnecessary due to redundancy. We propose Icebreaker, a principled framework to approach the ice-start problem. Icebreaker uses a full Bayesian Deep Latent Gaussian Model (BELGAM) with a novel inference method. Our proposed method combines recent advances in amortized inference and stochastic gradient MCMC to enable fast and accurate posterior inference. By utilizing BELGAM's ability to fully quantify model uncertainty, we also propose two information acquisition functions for imputation and active prediction problems. We demonstrate that BELGAM performs significantly better than the previous VAE (Variational autoencoder) based models, when the data set size is small, using both machine learning benchmarks and real-world recommender systems and health-care applications. Moreover, based on BELGAM, Icebreaker further improves the performance and demonstrate the ability to use minimum amount of the training data to obtain the highest test time performance.

[1]  J. Bernardo Expected Information as Expected Utility , 1979 .

[2]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[3]  Yordan Zaykov,et al.  Interpretable Outcome Prediction with Sparse Bayesian Neural Networks in Intensive Care , 2019, ArXiv.

[4]  Dharmveer Singh Rajpoot,et al.  Resolving Cold Start problem in recommendation system using demographic approach , 2016, 2016 International Conference on Signal Processing and Communication (ICSC).

[5]  Gang Niu,et al.  Active Feature Acquisition with Supervised Matrix Completion , 2018, KDD.

[6]  Jiayu Zhou,et al.  Active Matrix Completion , 2013, 2013 IEEE 13th International Conference on Data Mining.

[7]  Hanghang Tong,et al.  Ice-Breaking: Mitigating Cold-Start Recommendation Problem by Rating Comparison , 2015, IJCAI.

[8]  José Miguel Hernández-Lobato,et al.  Partial VAE for Hybrid Recommender System , 2018 .

[9]  Mark Crovella,et al.  Matrix Completion with Queries , 2015, KDD.

[10]  D. Lindley On a Measure of the Information Provided by an Experiment , 1956 .

[11]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[12]  Eric Horvitz,et al.  Traffic Updates: Saying a Lot While Revealing a Little , 2019, AAAI.

[13]  Cheng Zhang,et al.  ODIN: Optimal Discovery of High-value INformation Using Model-based Deep Reinforcement Learning , 2019 .

[14]  Sebastian Nowozin,et al.  EDDI: Efficient Dynamic Discovery of High-Value Information with Partial VAE , 2018, ICML.

[15]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine-mediated learning.

[16]  Barnabás Póczos,et al.  Active learning and search on low-rank matrices , 2013, KDD.

[17]  Eunho Yang,et al.  Joint Active Feature Acquisition and Classification with Variable-Size Set Encoding , 2018, NeurIPS.

[18]  Hedvig Kjellström,et al.  Advances in Variational Inference , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Zoubin Ghahramani,et al.  Deep Bayesian Active Learning with Image Data , 2017, ICML.

[20]  Foster J. Provost,et al.  Active Feature-Value Acquisition , 2009, Manag. Sci..

[21]  Yingzhen Li,et al.  Approximate inference: new visions , 2018 .

[22]  Andreas Krause,et al.  A Utility-Theoretic Approach to Privacy in Online Services , 2010, J. Artif. Intell. Res..

[23]  Burr Settles,et al.  Active Learning , 2012, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[24]  Pablo M. Olmos,et al.  Handling Incomplete Heterogeneous Data using VAEs , 2018, Pattern Recognit..

[25]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[26]  Jeffrey S. Rosenschein,et al.  Knowing What to Ask: A Bayesian Active Learning Approach to the Surveying Problem , 2017, AAAI.

[27]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[28]  Juan José Murillo-Fuentes,et al.  Inference in Deep Gaussian Processes using Stochastic Gradient Hamiltonian Monte Carlo , 2018, NeurIPS.

[29]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[30]  Mohamed Thahir,et al.  An efficient heuristic method for active feature acquisition and its application to protein-protein interaction prediction , 2012, BMC Proceedings.

[31]  Tomás Pevný,et al.  Classification with Costly Features using Deep Reinforcement Learning , 2019, AAAI.

[32]  Ran El-Yaniv,et al.  Online Choice of Active Learning Algorithms , 2003, J. Mach. Learn. Res..

[33]  José Bento,et al.  Generative Adversarial Active Learning , 2017, ArXiv.

[34]  F. Maxwell Harper,et al.  The MovieLens Datasets: History and Context , 2016, TIIS.

[35]  Aram Galstyan,et al.  Multitask learning and benchmarking with clinical time series data , 2017, Scientific Data.

[36]  Zoubin Ghahramani,et al.  Cold-start Active Learning with Robust Ordinal Matrix Factorization , 2014, ICML.

[37]  Zhe Gan,et al.  Bridging the Gap between Stochastic Gradient MCMC and Stochastic Optimization , 2015, AISTATS.

[38]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[39]  Zoubin Ghahramani,et al.  Bayesian Active Learning for Classification and Preference Learning , 2011, ArXiv.

[40]  Lawrence Carin,et al.  Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks , 2015, AAAI.

[41]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[42]  Leonidas J. Guibas,et al.  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[44]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[45]  Foster J. Provost,et al.  An expected utility approach to active feature-value acquisition , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[46]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[47]  David M. Pennock,et al.  Categories and Subject Descriptors , 2001 .

[48]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[49]  Foster J. Provost,et al.  Active feature-value acquisition for classifier induction , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[50]  Juan Lu,et al.  Predicting Outcome after Traumatic Brain Injury: Development and International Validation of Prognostic Scores Based on Admission Characteristics , 2008, PLoS medicine.