Interpretable Selection and Visualization of Features and Interactions Using Bayesian Forests

It is becoming increasingly important for machine learning methods to make predictions that are interpretable as well as accurate. In many practical applications, it is of interest which features and feature interactions are relevant to the prediction task. We present a novel method, Selective Bayesian Forest Classifier, that strikes a balance between predictive power and interpretability by simultaneously performing classification, feature selection, feature interaction detection and visualization. It builds parsimonious yet flexible models using tree-structured Bayesian networks, and samples an ensemble of such models using Markov chain Monte Carlo. We build in feature selection by dividing the trees into two groups according to their relevance to the outcome of interest. Our method performs competitively on classification and feature selection benchmarks in low and high dimensions, and includes a visualization tool that provides insight into relevant features and interactions.

[1]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[2]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[3]  Gregory F. Cooper,et al.  The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks , 1990, Artif. Intell..

[4]  Adrian E. Raftery,et al.  Bayesian Model Averaging , 1998 .

[5]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[6]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[7]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[8]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[11]  Jun S. Liu,et al.  Bayesian inference of epistatic interactions in case-control studies , 2007, Nature Genetics.

[12]  Richard A. Olshen,et al.  CART: Classification and Regression Trees , 1984 .

[13]  Adrian E. Raftery,et al.  Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data , 2005, Bioinform..

[14]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[15]  Shijie Tang,et al.  Nonparametric variable selection and classification: The CATCH algorithm , 2014, Comput. Stat. Data Anal..

[16]  Steve R. Gunn,et al.  Result Analysis of the NIPS 2003 Feature Selection Challenge , 2004, NIPS.

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  H. Chipman,et al.  BART: Bayesian Additive Regression Trees , 2008, 0806.3286.

[19]  Christina Kendziorski,et al.  An efficient method for identifying statistical interactors in gene association networks. , 2009, Biostatistics.

[20]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[21]  H. Chipman,et al.  Bayesian Additive Regression Trees , 2006 .

[22]  Krzysztof J. Cios,et al.  New synthesis of bayesian network classifiers and cardiac spect image interpretation , 1999 .

[23]  Paul D. McNicholas,et al.  Variable Selection for Clustering and Classification , 2013, J. Classif..

[24]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[25]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[26]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[27]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[28]  Tomaso A. Poggio,et al.  Regularization Networks and Support Vector Machines , 2000, Adv. Comput. Math..