Score-based methods for learning Markov boundaries by searching in constrained spaces

Within probabilistic classification problems, learning the Markov boundary of the class variable consists in the optimal approach for feature subset selection. In this paper we propose two algorithms that learn the Markov boundary of a selected variable. These algorithms are based on the score+search paradigm for learning Bayesian networks. Both algorithms use standard scoring functions but they perform the search in constrained spaces of class-focused directed acyclic graphs, going through the space by means of operators adapted for the problem. The algorithms have been validated experimentally by using a wide spectrum of databases, and their results show a performance competitive with the state-of-the-art.

[1]  David Heckerman,et al.  Models and Selection Criteria for Regression and Classification , 1997, UAI.

[2]  Todd,et al.  Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning , 2002, Nature Medicine.

[3]  Dimitris Margaritis,et al.  Speculative Markov blanket discovery for optimal feature selection , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[4]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[5]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[6]  Sebastian Thrun,et al.  Bayesian Network Induction via Local Neighborhoods , 1999, NIPS.

[7]  A. H. Murphy,et al.  Hailfinder: A Bayesian system for forecasting severe weather , 1996 .

[8]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[9]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[11]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[12]  Luis M. de Campos,et al.  A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests , 2006, J. Mach. Learn. Res..

[13]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[14]  Kristian Kristensen,et al.  The use of a Bayesian network in the design of a decision support system for growing malting barley without use of pesticides , 2002 .

[15]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[16]  Luis M. de Campos,et al.  Searching for Bayesian Network Structures in the Space of Restricted Acyclic Partially Directed Graphs , 2011, J. Artif. Intell. Res..

[17]  Alex Aussem,et al.  A Novel Scalable and Data Efficient Feature Subset Selection Algorithm , 2008, ECML/PKDD.

[18]  Shunkai Fu,et al.  Fast Markov Blanket Discovery Algorithm Via Local Learning within Single Pass , 2008, Canadian Conference on AI.

[19]  Chuang Lin,et al.  On sensitivity of case-based reasoning to optimal feature subsets in business failure prediction , 2010, Expert Syst. Appl..

[20]  David Heckerman,et al.  Learning Bayesian Networks: Search Methods and Experimental Results , 1995 .

[21]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[22]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.

[23]  Luis M. de Campos,et al.  Learning Bayesian Network Classifiers: Searching in a Space of Partially Directed Acyclic Graphs , 2005, Machine Learning.

[24]  Shunkai Fu,et al.  Tradeoff Analysis of Different Markov Blanket Local Learning Approaches , 2008, PAKDD.

[25]  Claus Skaanning Blocking Gibbs Sampling for Inference in Large and Complex Bayesian Networks with Applications in Genetics , 1997 .

[26]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[27]  Pedro Larrañaga,et al.  Feature subset selection by genetic algorithms and estimation of distribution algorithms - A case study in the survival of cirrhotic patients treated with TIPS , 2001, Artif. Intell. Medicine.

[28]  Jose Miguel Puerta,et al.  A Fast Hill-Climbing Algorithm for Bayesian Networks Structure Learning , 2007, ECSQARU.

[29]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[30]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[31]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[32]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[33]  Constantin F. Aliferis,et al.  HITON: A Novel Markov Blanket Algorithm for Optimal Variable Selection , 2003, AMIA.

[34]  Luis M. de Campos,et al.  Minimum redundancy maximum relevancy versus score-based methods for learning Markov boundaries , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[35]  Pedro Larrañaga,et al.  Predicting survival in malignant skin melanoma using Bayesian networks automatically induced by genetic algorithms. An empirical comparison between different approaches , 1998, Artif. Intell. Medicine.

[36]  H. Zheng,et al.  Feature selection for high dimensional data in astronomy , 2007, 0709.0138.

[37]  Xu Hong,et al.  S-IAMB Algorithm for Markov Blanket Discovery , 2009, 2009 Asia-Pacific Conference on Information Processing.

[38]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[39]  Hiroshi Mamitsuka Empirical evaluation of ensemble feature subset selection methods for learning from a high-dimensional database in drug design , 2003, Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings..

[40]  Alex Aussem,et al.  A novel scalable and correct Markov boundary learning algorithms under faithfulness condition , 2008 .

[41]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[43]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[44]  Henry Tirri,et al.  On Discriminative Bayesian Network Classifiers and Logistic Regression , 2005, Machine Learning.

[45]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[46]  Wai Lam,et al.  LEARNING BAYESIAN BELIEF NETWORKS: AN APPROACH BASED ON THE MDL PRINCIPLE , 1994, Comput. Intell..

[47]  H. Akaike A new look at the statistical model identification , 1974 .

[48]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[49]  Yan Ge,et al.  Feature selection for support vector machine in financial crisis prediction: a case study in China , 2010, Expert Syst. J. Knowl. Eng..

[50]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[51]  André Elisseeff,et al.  Using Markov Blankets for Causal Structure Learning , 2008, J. Mach. Learn. Res..

[52]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[53]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[54]  Luis M. de Campos,et al.  Bayesian network learning algorithms using structural restrictions , 2007, Int. J. Approx. Reason..

[55]  Jesper Tegnér,et al.  Towards scalable and data efficient learning of Markov boundaries , 2007, Int. J. Approx. Reason..

[56]  Shunkai Fu,et al.  Local Learning Algorithm for Markov Blanket Discovery , 2007, Australian Conference on Artificial Intelligence.

[57]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[58]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[59]  Luis M. de Campos,et al.  A new approach for learning belief networks using independence criteria , 2000, Int. J. Approx. Reason..

[60]  Robert G. Cowell,et al.  Conditions Under Which Conditional Independence and Scoring Methods Lead to Identical Selection of Bayesian Network Models , 2001, UAI.

[61]  Constantin F. Aliferis,et al.  Time and sample efficient discovery of Markov blankets and direct causal relations , 2003, KDD '03.

[62]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[63]  Gregory M. Provan,et al.  Efficient Learning of Selective Bayesian Network Classifiers , 1996, ICML.

[64]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[65]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[66]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[67]  Jesper Tegnér,et al.  Growing Bayesian network models of gene networks from seed genes , 2005, ECCB/JBI.

[68]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[69]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[70]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[71]  Pedro Larrañaga,et al.  Prototype Selection and Feature Subset Selection by Estimation of Distribution Algorithms. A Case Study in the Survival of Cirrhotic Patients Treated with TIPS , 2001, AIME.

[72]  Joseph Ramsey A PC-Style Markov Blanket for High Dimensional Datasets , 2006 .

[73]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[74]  Stuart J. Russell,et al.  Adaptive Probabilistic Networks with Hidden Variables , 1997, Machine Learning.

[75]  Franz Pernkopf,et al.  Bayesian network classifiers versus selective k-NN classifier , 2005, Pattern Recognit..