On Tractable Computation of Expected Predictions

Computing expected predictions of discriminative models is a fundamental task in machine learning that appears in many interesting applications such as fairness, handling missing values, and data analysis. Unfortunately, computing expectations of a discriminative model with respect to a probability distribution defined by an arbitrary generative model has been proven to be hard in general. In fact, the task is intractable even for simple models such as logistic regression and a naive Bayes distribution. In this paper, we identify a pair of generative and discriminative models that enables tractable computation of expectations, as well as moments of any order, of the latter with respect to the former in case of regression. Specifically, we consider expressive probabilistic circuits with certain structural constraints that support tractable probabilistic inference. Moreover, we exploit the tractable computation of high-order moments to derive an algorithm to approximate the expectations for classification scenarios in which exact computations are intractable. Our framework to compute expected predictions allows for handling of missing data during prediction time in a principled and accurate way and enables reasoning about the behavior of discriminative models. We empirically show our algorithm to consistently outperform standard imputation techniques on a variety of datasets. Finally, we illustrate how our framework can be used for exploratory data analysis.

[1]  David Duvenaud,et al.  Explaining Image Classifiers by Counterfactual Generation , 2018, ICLR.

[2]  Krishna P. Gummadi,et al.  From Parity to Preference-based Notions of Fairness in Classification , 2017, NIPS.

[3]  Adnan Darwiche,et al.  A differential approach to inference in Bayesian networks , 2000, JACM.

[4]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[5]  Guy Van den Broeck,et al.  Probabilistic Sentential Decision Diagrams , 2014, KR.

[6]  Ursula Gather,et al.  Combining regular and irregular histograms by penalized likelihood , 2010, Comput. Stat. Data Anal..

[7]  Guy Van den Broeck,et al.  Towards Hardware-Aware Tractable Learning of Probabilistic Models , 2019, NeurIPS.

[8]  J. Schafer Multiple imputation: a primer , 1999, Statistical methods in medical research.

[9]  Rain,et al.  Towards Compact Interpretable Models : Shrinking of Learned Probabilistic Sentential Decision Diagrams , 2017 .

[10]  Yann LeCun,et al.  The mnist database of handwritten digits , 2005 .

[11]  Andreas Krause,et al.  Optimal Value of Information in Graphical Models , 2009, J. Artif. Intell. Res..

[12]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[13]  Adnan Darwiche,et al.  Same-decision probability: A confidence measure for threshold-based decisions , 2012, Int. J. Approx. Reason..

[14]  Constantine Frangakis,et al.  Multiple imputation by chained equations: what is it and how does it work? , 2011, International journal of methods in psychiatric research.

[15]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[16]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[17]  Pierre Marquis,et al.  A Knowledge Compilation Map , 2002, J. Artif. Intell. Res..

[18]  Adnan Darwiche,et al.  A Tractable Probabilistic Model for Subset Selection , 2017, UAI.

[19]  Rómer Rosales,et al.  Active Sensing , 2009, AISTATS.

[20]  Kristian Kersting,et al.  Sum-Product Autoencoding: Encoding and Decoding Representations Using Sum-Product Networks , 2018, AAAI.

[21]  Adnan Darwiche,et al.  Conditional PSDDs: Modeling and Learning With Modular Knowledge , 2018, AAAI.

[22]  Guy Van den Broeck,et al.  Smoothing Structured Decomposable Circuits , 2019, NeurIPS.

[23]  Guy Van den Broeck,et al.  What to Expect of Classifiers? Reasoning about Logistic Regression with Missing Features , 2019, IJCAI.

[24]  Floriana Esposito,et al.  Visualizing and understanding Sum-Product Networks , 2016, Machine Learning.

[25]  Guy Van den Broeck,et al.  Learning the Structure of Probabilistic Sentential Decision Diagrams , 2017, UAI.

[26]  Guy Van den Broeck,et al.  Optimal Feature Selection for Decision Robustness in Bayesian Networks , 2017, IJCAI.

[27]  Krishna P. Gummadi,et al.  Fairness Constraints: Mechanisms for Fair Classification , 2015, AISTATS.

[28]  Guy Van den Broeck,et al.  Tractable Learning for Structured Probability Spaces: A Case Study in Learning Preference Distributions , 2015, IJCAI.

[29]  Dan Roth,et al.  On the Hardness of Approximate Reasoning , 1993, IJCAI.

[30]  Vibhav Gogate,et al.  Cutset Networks: A Simple, Tractable, and Scalable Approach for Improving the Accuracy of Chow-Liu Trees , 2014, ECML/PKDD.

[31]  Guy Van den Broeck,et al.  Learning Fair Naive Bayes Classifiers by Discovering and Eliminating Discrimination Patterns , 2019, AAAI.

[32]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[33]  Adnan Darwiche,et al.  Tractable Operations for Arithmetic Circuits of Probabilistic Models , 2016, NIPS.

[34]  Guy Van den Broeck,et al.  Learning Logistic Circuits , 2019, AAAI.

[35]  Nicole A. Lazar,et al.  Statistical Analysis With Missing Data , 2003, Technometrics.

[36]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[37]  Saso Dzeroski,et al.  MetaBags: Bagged Meta-Decision Trees for Regression , 2018, ECML/PKDD.