A Novel Scalable and Data Efficient Feature Subset Selection Algorithm

In this paper, we aim to identify the minimal subset of discrete random variables that is relevant for probabilistic classification in data sets with many variables but few instances. A principled solution to this problem is to determine the Markov boundaryof the class variable. Also, we present a novel scalable, data efficient and correct Markov boundary learning algorithm under the so-called faithfulnesscondition. We report extensive empiric experiments on synthetic and real data sets scaling up to 139,351 variables.

[1]  Dimitris Margaritis,et al.  Speculative Markov blanket discovery for optimal feature selection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[2]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[3]  Sandeep Yaramakala Fast Markov blanket discovery , 2020 .

[4]  Ethem Alpaydin,et al.  Handling of Deterministic Relationships in Constraint-based Causal Discovery , 2002, European Workshop on Probabilistic Graphical Models.

[5]  Alex Aussem,et al.  Nasopharyngeal Carcinoma Data Analysis with a Novel Bayesian Network Skeleton Learning Algorithm , 2007, AIME.

[6]  Marek J. Druzdzel,et al.  Robust Independence Testing for Constraint-Based Learning of Causal Structure , 2002, UAI.

[7]  Jesper Tegnér,et al.  Consistent Feature Selection for Pattern Recognition in Polynomial Time , 2007, J. Mach. Learn. Res..

[8]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[9]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[10]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[11]  Jesper Tegnér,et al.  Towards scalable and data efficient learning of Markov boundaries , 2007, Int. J. Approx. Reason..

[12]  Wei Luo,et al.  Learning Bayesian Networks in Semi-deterministic Systems , 2006, Canadian Conference on AI.

[13]  David Page,et al.  KDD Cup 2001 report , 2002, SKDD.

[14]  Alex Aussem,et al.  Handling almost-deterministic relationships in constraint-based Bayesian network discovery : Application to cancer risk factor identification , 2008, ESANN.

[15]  Alex Aussem,et al.  A novel Bayesian Network structure learning algorithm based on minimal correlated itemset mining techniques , 2007, 2007 2nd International Conference on Digital Information Management.

[16]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..