A hybrid algorithm for Bayesian network structure learning with application to multi-label learning

We present a novel hybrid algorithm for Bayesian network structure learning, called H2PC. It first reconstructs the skeleton of a Bayesian network and then performs a Bayesian-scoring greedy hill-climbing search to orient the edges. The algorithm is based on divide-and-conquer constraint-based subroutines to learn the local structure around a target variable. We conduct two series of experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is currently the most powerful state-of-the-art algorithm for Bayesian network structure learning. First, we use eight well-known Bayesian network benchmarks with various data sizes to assess the quality of the learned structure returned by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in terms of goodness of fit to new data and quality of the network structure with respect to the true dependence structure of the data. Second, we investigate H2PC's ability to solve the multi-label learning problem. We provide theoretical results to characterize and identify graphically the so-called minimal label powersets that appear as irreducible factors in the joint distribution under the faithfulness condition. The multi-label learning problem is then decomposed into a series of multi-class classification problems, where each multi-class variable encodes a label powerset. H2PC is shown to compare favorably to MMHC in terms of global classification accuracy over ten multi-label data sets covering different application domains. Overall, our experiments support the conclusions that local structural learning with H2PC in the form of local neighborhood induction is a theoretically well-motivated and empirically effective learning framework that is well suited to multi-label learning. The source code (in R) of H2PC as well as all data sets used for the empirical tests are publicly available.

[1]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[2]  M. Scutari Measures of Variability for Graphical Models , 2011 .

[3]  Alex Aussem,et al.  Learning the local Bayesian network structure around the ZNF217 oncogene in breast tumours , 2013, Comput. Biol. Medicine.

[4]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[5]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[6]  Ioannis Tsamardinos,et al.  A unified approach to estimation and control of the False Discovery Rate in Bayesian network skeleton identification , 2011, ESANN.

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Satoru Miyano,et al.  Finding Optimal Models for Small Gene Networks , 2003 .

[9]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[10]  Saso Dzeroski,et al.  Ensembles of Multi-Objective Decision Trees , 2007, ECML.

[11]  Liviu Badea,et al.  Determining the Direction of Causal Influence in Large Probabilistic Networks: A Constraint-Based Approach , 2004, ECAI.

[12]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[13]  Yuhong Guo,et al.  Multi-Label Classification Using Conditional Dependency Networks , 2011, IJCAI.

[14]  Newton Spolaôr,et al.  A Comparison of Multi-label Feature Selection Methods using the Problem Transformation Approach , 2013, CLEI Selected Papers.

[15]  Gavin C. Cawley,et al.  Causal and Non-Causal Feature Selection for Ridge Regression , 2008, WCCI Causation and Prediction Challenge.

[16]  M. Scutari,et al.  Bayesian Network Structure Learning with Permutation Tests , 2011, 1101.5184.

[17]  David A. Bell,et al.  Learning Bayesian networks from data: An information-theory based approach , 2002, Artif. Intell..

[18]  Luc De Raedt,et al.  Top-Down Induction of Clustering Trees , 1998, ICML.

[19]  Concha Bielza,et al.  Predicting human immunodeficiency virus inhibitors using multi-dimensional Bayesian network classifiers , 2013, Artif. Intell. Medicine.

[20]  Satoru Miyano,et al.  Optimal Search on Clustered Structural Constraint for Learning Bayesian Network Structure , 2010, J. Mach. Learn. Res..

[21]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[22]  David Maxwell Chickering,et al.  Large-Sample Learning of Bayesian Networks is NP-Hard , 2002, J. Mach. Learn. Res..

[23]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[24]  Oded Maron,et al.  Multiple-Instance Learning for Natural Scene Classification , 1998, ICML.

[25]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[26]  P. Spirtes,et al.  Causation, Prediction, and Search, 2nd Edition , 2001 .

[27]  Tomi Silander,et al.  A Simple Approach for Finding the Globally Optimal Bayesian Network Structure , 2006, UAI.

[28]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[29]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[30]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[31]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[32]  James Cussens,et al.  Advances in Bayesian Network Learning using Integer Programming , 2013, UAI.

[33]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[34]  Jesper Tegnér,et al.  Growing Bayesian network models of gene networks from seed genes , 2005, ECCB/JBI.

[35]  Alex Aussem,et al.  Analysis of lifestyle and metabolic predictors of visceral obesity with Bayesian Networks , 2010, BMC Bioinformatics.

[36]  Alex Aussem,et al.  An Efficient and Scalable Algorithm for Local Bayesian Network Structure Discovery , 2010, ECML/PKDD.

[37]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[38]  José M. Peña,et al.  Learning Gaussian Graphical Models of Gene Networks with False Discovery Rate Control , 2008, EvoBIO.

[39]  Jiawei Han,et al.  Correlated multi-label feature selection , 2011, CIKM '11.

[40]  Alexander J. Hartemink,et al.  Informative Structure Priors: Joint Learning of Dynamic Regulatory Networks from Multiple Types of Data , 2004, Pacific Symposium on Biocomputing.

[41]  Alex Aussem,et al.  Analysis of nasopharyngeal carcinoma risk factors with Bayesian networks , 2012, Artif. Intell. Medicine.

[42]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[43]  Héctor Allende,et al.  Multi-Label Text Classification with a Robust Label Dependent Representation , 2011 .

[44]  Laura E. Brown,et al.  A Strategy for Making Predictions Under Manipulation , 2008, WCCI Causation and Prediction Challenge.

[45]  Volker Roth,et al.  Improved functional prediction of proteins by learning kernel combinations in multilabel settings , 2007, BMC Bioinformatics.

[46]  S. Miyano,et al.  Finding Optimal Bayesian Network Given a Super-Structure , 2008 .

[47]  Andrew W. Moore,et al.  Optimal Reinsertion: A New Search Operator for Accelerated and More Accurate Bayesian Network Structure Learning , 2003, ICML.

[48]  W. Wong,et al.  Learning Causal Bayesian Network Structures From Experimental Data , 2008 .

[49]  Kun Zhang,et al.  Multi-label learning by exploiting label dependency , 2010, KDD.

[50]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[51]  Mikko Koivisto,et al.  Exact Bayesian Structure Discovery in Bayesian Networks , 2004, J. Mach. Learn. Res..

[52]  José M. Peña,et al.  Finding Consensus Bayesian Network Structures , 2011, J. Artif. Intell. Res..

[53]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[54]  Milan Studený,et al.  Learning Bayesian network structure: Towards the essential graph by integer linear programming tools , 2014, Int. J. Approx. Reason..

[55]  Giorgos Borboudakis,et al.  Permutation Testing Improves Bayesian Network Learning , 2010, ECML/PKDD.

[56]  Jesper Tegnér,et al.  Towards scalable and data efficient learning of Markov boundaries , 2007, Int. J. Approx. Reason..

[57]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[58]  Radhakrishnan Nagarajan,et al.  Identifying significant edges in graphical models of molecular networks , 2011, Artif. Intell. Medicine.

[59]  Eyke Hüllermeier,et al.  On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[60]  Radhakrishnan Nagarajan,et al.  Bayesian Networks in R: with Applications in Systems Biology , 2013 .

[61]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[62]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[63]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[64]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[65]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[66]  Juan José del Coz,et al.  Binary relevance efficacy for multilabel classification , 2012, Progress in Artificial Intelligence.

[67]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[68]  Everton Alvares Cherman,et al.  Incorporating label dependency into the binary relevance framework for multi-label classification , 2012, Expert Syst. Appl..

[69]  Haytham Elghazel,et al.  An Experimental Comparison of Hybrid Algorithms for Bayesian Network Structure Learning , 2012, ECML/PKDD.

[70]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[71]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[72]  Laura E. Brown,et al.  Bounding the False Discovery Rate in Local Bayesian Network Learning , 2008, AAAI.

[73]  G. Cawley Causal & non-causal feature selection for ridge regression , 2008 .

[74]  Alex Aussem,et al.  A novel Markov boundary based feature subset selection algorithm , 2010, Neurocomputing.

[75]  Edwin Villanueva,et al.  Efficient methods for learning Bayesian network super-structures , 2014, Neurocomputing.

[76]  Edwin Villanueva,et al.  Optimized Algorithm for Learning Bayesian Network Super-structures , 2012, ICPRAM.

[77]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.