Optimal Feature Selection for Decision Robustness in Bayesian Networks

In many applications, one can define a large set of features to support the classification task at hand. At test time, however, these become prohibitively expensive to evaluate, and only a small subset of features is used, often selected for their information-theoretic value. For threshold-based, Naive Bayes classifiers, recent work has suggested selecting features that maximize the expected robustness of the classifier, that is, the expected probability it maintains its decision after seeing more features. We propose the first algorithm to compute this expected same-decision probability for general Bayesian network classifiers, based on compiling the network into a tractable circuit representation. Moreover, we develop a search algorithm for optimal feature selection that utilizes efficient incremental circuit modifications. Experiments on Naive Bayes, as well as more general networks, show the efficacy and distinct behavior of this decision-making approach.

[1]  Henry A. Kautz,et al.  Performing Bayesian Inference by Weighted Model Counting , 2005, AAAI.

[2]  Andreas Krause,et al.  Optimal Value of Information in Graphical Models , 2009, J. Artif. Intell. Res..

[3]  Adnan Darwiche,et al.  Algorithms and Applications for the Same-Decision Probability , 2014, J. Artif. Intell. Res..

[4]  Lise Getoor,et al.  Value of Information Lattice: Exploiting Probabilistic Independence for Effective Feature Subset Acquisition , 2011, J. Artif. Intell. Res..

[5]  Adnan Darwiche,et al.  Compiling Probabilistic Graphical Models Using Sentential Decision Diagrams , 2013, ECSQARU.

[6]  Adnan Darwiche,et al.  Value of Information Based on Decision Robustness , 2015, AAAI.

[7]  Jason Stanley,et al.  A Rank-Based Approach to Active Diagnosis , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Adnan Darwiche,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence SDD: A New Canonical Representation of Propositional Knowledge Bases , 2022 .

[9]  Adnan Darwiche,et al.  An Exact Algorithm for Computing the Same-Decision Probability , 2013, IJCAI.

[10]  Adnan Darwiche,et al.  Solving MAP Exactly by Searching on Compiled Arithmetic Circuits , 2006, AAAI.

[11]  Richard E. Korf,et al.  Multi-Way Number Partitioning , 2009, IJCAI.

[12]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[13]  Adnan Darwiche,et al.  On probabilistic inference by weighted model counting , 2008, Artif. Intell..

[14]  Adnan Darwiche,et al.  Same-decision probability: A confidence measure for threshold-based decisions , 2012, Int. J. Approx. Reason..

[15]  Pierre Marquis,et al.  A Knowledge Compilation Map , 2002, J. Artif. Intell. Res..

[16]  Bruce K. Bell,et al.  Volume 5 , 1998 .

[17]  Bart Selman,et al.  Knowledge compilation and theory approximation , 1996, JACM.

[18]  Rina Dechter,et al.  AND/OR search spaces for graphical models , 2007, Artif. Intell..

[19]  Jose M. Such,et al.  International Joint Conference on Artificial Intelligence (IJCAI) , 2016 .

[20]  Simone Bova SDDs Are Exponentially More Succinct than OBDDs , 2016, AAAI.

[21]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[22]  Eric Horvitz,et al.  An Approximate Nonmyopic Computation for Value of Information , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Adnan Darwiche,et al.  Computer Adaptive Testing Using the Same-Decision Probability , 2015, BMA@UAI.

[24]  DIMITRIOS PIERRAKOS,et al.  User Modeling and User-Adapted Interaction , 1994, User Modeling and User-Adapted Interaction.

[25]  H. Damasio,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[26]  Umut Oztok,et al.  Solving PPPP-Complete Problems Using Knowledge Compilation , 2016, KR.

[27]  Daphne Koller,et al.  Active Classification based on Value of Classifier , 2011, NIPS.

[28]  Guy Van den Broeck,et al.  On the Role of Canonicity in Knowledge Compilation , 2015, AAAI.

[29]  Matteo Pozzi,et al.  Conditional entropy and value of information metrics for optimal sensing in infrastructure systems , 2016 .

[30]  Daniel L Rubin,et al.  A novel method to assess incompleteness of mammography reports. , 2014, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[31]  Qiang Ji,et al.  Efficient Sensor Selection for Active Information Fusion , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[32]  Adnan Darwiche,et al.  Compiling Bayesian Networks with Local Structure , 2005, IJCAI.