Causality-based Feature Selection: Methods and Evaluations

Feature selection is a crucial preprocessing step in data analytics and machine learning. Classical feature selection algorithms select features based on the correlations between predictive features and the class variable and do not attempt to capture causal relationships between them. It has been shown that the knowledge about the causal relationships between features and the class variable has potential benefits for building interpretable and robust prediction models, since causal relationships imply the underlying mechanism of a system. Consequently, causality-based feature selection has gradually attracted greater attentions and many algorithms have been proposed. In this paper, we present a comprehensive review of recent advances in causality-based feature selection. To facilitate the development of new algorithms in the research area and make it easy for the comparisons between new methods and existing ones, we develop the first open-source package, called CausalFS, which consists of most of the representative causality-based feature selection algorithms (available at this https URL). Using CausalFS, we conduct extensive experiments to compare the representative algorithms with both synthetic and real-world data sets. Finally, we discuss some challenging problems to be tackled in future causality-based feature selection research.

[1]  Bernhard Schölkopf,et al.  Invariant Models for Causal Transfer Learning , 2015, J. Mach. Learn. Res..

[2]  Concha Bielza,et al.  Learning tractable Bayesian networks in the space of elimination orders , 2019, Artif. Intell..

[3]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[4]  André Elisseeff,et al.  Using Markov Blankets for Causal Structure Learning , 2008, J. Mach. Learn. Res..

[5]  Peter Bühlmann,et al.  Causal Inference Using Graphical Models with the R Package pcalg , 2012 .

[6]  Xindong Wu,et al.  Towards Scalable and Accurate Online Feature Selection for Big Data , 2014, 2014 IEEE International Conference on Data Mining.

[7]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[8]  R Scheines,et al.  The TETRAD Project: Constraint Based Aids to Causal Model Specification. , 1998, Multivariate behavioral research.

[9]  Calton Pu,et al.  Evolutionary study of web spam: Webb Spam Corpus 2011 versus Webb Spam Corpus 2006 , 2012, 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[10]  Qiang Ji,et al.  Constrained Local Latent Variable Discovery , 2016, IJCAI.

[11]  Sebastian Thrun,et al.  Bayesian Network Induction via Local Neighborhoods , 1999, NIPS.

[12]  Gavin Brown,et al.  Simple strategies for semi-supervised feature selection , 2017, Machine Learning.

[13]  Giorgos Borboudakis,et al.  Towards Robust and Versatile Causal Discovery for Business Applications , 2016, KDD.

[14]  Teppo Niinimaki,et al.  Local Structure Discovery in Bayesian Networks , 2012, UAI.

[15]  Jean Honorio,et al.  Learning Identifiable Gaussian Bayesian Networks in Polynomial Time and Sample Complexity , 2017, NIPS.

[16]  Z. Geng,et al.  Discovering and orienting the edges connected to a target variable in a DAG via a sequential local learning approach , 2014, Comput. Stat. Data Anal..

[17]  Luis M. de Campos,et al.  Learning Bayesian Network Classifiers: Searching in a Space of Partially Directed Acyclic Graphs , 2005, Machine Learning.

[18]  Constantin F. Aliferis,et al.  Towards Principled Feature Selection: Relevancy, Filters and Wrappers , 2003 .

[19]  Kui Yu,et al.  Discovering Markov Blanket from Multiple interventional Datasets , 2018, ArXiv.

[20]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[21]  Jake M. Hofman,et al.  Prediction and explanation in social systems , 2017, Science.

[22]  Qiang Ji,et al.  Efficient Structure Learning of Bayesian Networks using Constraints , 2011, J. Mach. Learn. Res..

[23]  Ping He,et al.  Partial orientation and local structural learning of causal networks for prediction , 2008, WCCI Causation and Prediction Challenge.

[24]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[25]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[26]  Luis M. de Campos,et al.  A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests , 2006, J. Mach. Learn. Res..

[27]  Gavin Brown,et al.  Informative Priors for Markov Blanket Discovery , 2012, AISTATS.

[28]  Giorgos Borboudakis,et al.  Forward-Backward Selection with Early Dropping , 2017, J. Mach. Learn. Res..

[29]  Huanhuan Chen,et al.  Accurate Markov Boundary Discovery for Causal Feature Selection , 2020, IEEE Transactions on Cybernetics.

[30]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[31]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[32]  Dimitris Margaritis Toward Provably Correct Feature Selection in Arbitrary Domains , 2009, NIPS.

[33]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[34]  Tian Gao,et al.  Parallel Bayesian Network Structure Learning , 2018, ICML.

[35]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[36]  Xu-Qing Liu,et al.  Markov Blanket and Markov Boundary of Multiple Variables , 2018, J. Mach. Learn. Res..

[37]  Hao Wang,et al.  Online Feature Selection with Streaming Features , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  Constantin F. Aliferis,et al.  Ultra-scalable and efficient methods for hybrid observational and experimental local causal pathway discovery , 2015, J. Mach. Learn. Res..

[39]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[40]  Andrés R. Masegosa,et al.  A Bayesian stochastic search method for discovering Markov boundaries , 2012, Knowl. Based Syst..

[41]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[42]  Ruocheng Guo,et al.  A Survey of Learning Causality with Data , 2018, ACM Comput. Surv..

[43]  Bernhard Schölkopf,et al.  Learning causality and causality-related learning: some recent progress. , 2018, National science review.

[44]  Xinsheng Liu,et al.  Swamping and masking in Markov boundary discovery , 2016, Machine Learning.

[45]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[46]  Gavin Brown,et al.  Markov Blanket Discovery in Positive-Unlabelled and Semi-supervised Data , 2015, ECML/PKDD.

[47]  Bernhard Schölkopf,et al.  Multi-Source Domain Adaptation: A Causal View , 2015, AAAI.

[48]  Constantin F. Aliferis,et al.  HITON: A Novel Markov Blanket Algorithm for Optimal Variable Selection , 2003, AMIA.

[49]  Mikko Koivisto,et al.  Exact Bayesian Structure Discovery in Bayesian Networks , 2004, J. Mach. Learn. Res..

[50]  Jiuyong Li,et al.  Practical Approaches to Causal Relationship Exploration , 2015, SpringerBriefs in Electrical and Computer Engineering.

[51]  Constantin F. Aliferis,et al.  Algorithms for discovery of multiple Markov boundaries , 2013, J. Mach. Learn. Res..

[52]  Zoubin Ghahramani,et al.  The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models , 2009, J. Mach. Learn. Res..

[53]  Jing Zhou,et al.  Streamwise Feature Selection , 2006, J. Mach. Learn. Res..

[54]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[55]  P. Spirtes,et al.  Review of Causal Discovery Methods Based on Graphical Models , 2019, Front. Genet..

[56]  Kshitij P. Fadnis,et al.  Local-to-Global Bayesian Network Structure Learning , 2017, ICML.

[57]  Ivor W. Tsang,et al.  The Emerging "Big Dimensionality" , 2014, IEEE Computational Intelligence Magazine.

[58]  P. Spirtes,et al.  Ancestral graph Markov models , 2002 .

[59]  Yaniv Gurwicz,et al.  Bayesian Structure Learning by Recursive Bootstrap , 2018, NeurIPS.

[60]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[61]  Jesper Tegnér,et al.  Towards scalable and data efficient learning of Markov boundaries , 2007, Int. J. Approx. Reason..

[62]  Dimitris Margaritis,et al.  Speculative Markov blanket discovery for optimal feature selection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[63]  David Maxwell Chickering,et al.  Large-Sample Learning of Bayesian Networks is NP-Hard , 2002, J. Mach. Learn. Res..

[64]  Hao Wang,et al.  BAMB: A Balanced Markov Blanket Discovery Approach to Feature Selection , 2019, ACM Trans. Intell. Syst. Technol..

[65]  Susan Athey,et al.  Beyond prediction: Using big data for policy problems , 2017, Science.

[66]  Joris M. Mooij,et al.  Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions , 2017, NeurIPS.

[67]  Shunkai Fu,et al.  Fast Markov Blanket Discovery Algorithm Via Local Learning within Single Pass , 2008, Canadian Conference on AI.

[68]  Huan Liu,et al.  Challenges of Feature Selection for Big Data Analytics , 2016, IEEE Intelligent Systems.

[69]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..