Automatically mining Relevant Variable Interactions Via Sparse Bayesian Learning

With the rapid increase in the availability of large amount of data, prediction is becoming increasingly popular, and has widespread through our daily life. However, powerful nonlinear prediction methods such as deep learning and SVM suffer from interpretability problem, making it hard to use in domains where the reason for decision making is required. In this paper, we develop an interpretable non-linear model called itemset Sparse Bayes (iSB), which builds a Bayesian probabilistic model, while simultaneously considering variable interactions. In order to suppress the resulting large number of variables, sparsity is imposed on regression weights by a sparsity inducing prior. As a subroutine to search for variable interactions, itemset enumeration algorithm is employed with a novel bounding condition. In computational experiments using real-world dataset, the proposed method performed better than decision tree by 10% in terms of $r^{2}$. We also demonstrated the advantage of our method in Bayesian optimization setting, in which the proposed approach could successfully find the maximum of an unknown function while maintaining transparency. Apart from Bayesian optimization with Gaussian process, $i$ SB gives us a clue to understand which variables interactions are important in optimizing an unknown function.

[1]  Takeaki Uno,et al.  Mining complex genotypic features for predicting HIV-1 drug resistance , 2007, Bioinform..

[2]  Yasuo Tabei,et al.  Entire Regularization Path for Sparse Nonnegative Interaction Model , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[3]  Nicole Krämer,et al.  Partial least squares regression for graph mining , 2008, KDD.

[4]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[5]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[6]  William Stafford Noble,et al.  DeepPINK: reproducible feature selection in deep neural networks , 2018, NeurIPS.

[7]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevance Vector Machine , 2001 .

[8]  David P. Wipf,et al.  A New View of Automatic Relevance Determination , 2007, NIPS.

[9]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[10]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[11]  Hiroki Arimura,et al.  LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets , 2003, FIMI.

[12]  Ichiro Takeuchi,et al.  Safe Pattern Pruning: An Efficient Approach for Predictive Pattern Mining , 2016, KDD.

[13]  Ichiro Takeuchi,et al.  Discovering combinatorial interactions in survival data , 2013, Bioinform..

[14]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[15]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[16]  R. Tibshirani,et al.  A LASSO FOR HIERARCHICAL INTERACTIONS. , 2012, Annals of statistics.

[17]  Jean-Philippe Vert,et al.  WHInter: A Working set algorithm for High-dimensional sparse second order Interaction models , 2018, ICML.

[18]  Bryan Chan,et al.  Human immunodeficiency virus reverse transcriptase and protease sequence database , 2003, Nucleic Acids Res..