Minimum redundancy maximum relevancy versus score-based methods for learning Markov boundaries

Feature subset selection is increasingly becoming an important preprocessing step within the field of automatic classification. This is due to the fact that the domain problems currently considered contain a high number of variables, and some kind of dimensionality reduction becomes necessary, in order to make the classification task approachable. In this paper we make an experimental comparison between a state-of-the-art method for feature selection, namely minimum Redundancy Maximum Relevance, and a recently proposed method for learning Markov boundaries based on searching for Bayesian network structures in constrained spaces using standard scoring functions.

[1]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[2]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[3]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[4]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[5]  Stuart J. Russell,et al.  Adaptive Probabilistic Networks with Hidden Variables , 1997, Machine Learning.

[6]  Kristian Kristensen,et al.  The use of a Bayesian network in the design of a decision support system for growing malting barley without use of pesticides , 2002 .

[7]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[8]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[9]  Luis M. de Campos,et al.  Score-based methods for learning Markov boundaries by searching in constrained spaces , 2011, Data Mining and Knowledge Discovery.

[10]  Claus Skaanning Blocking Gibbs Sampling for Inference in Large and Complex Bayesian Networks with Applications in Genetics , 1997 .

[11]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[13]  Jesper Tegnér,et al.  Towards scalable and data efficient learning of Markov boundaries , 2007, Int. J. Approx. Reason..

[14]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[15]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[16]  A. H. Murphy,et al.  Hailfinder: A Bayesian system for forecasting severe weather , 1996 .

[17]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[18]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[19]  Luis M. de Campos,et al.  Learning Bayesian Network Classifiers: Searching in a Space of Partially Directed Acyclic Graphs , 2005, Machine Learning.

[20]  Jose Miguel Puerta,et al.  A Fast Hill-Climbing Algorithm for Bayesian Networks Structure Learning , 2007, ECSQARU.