A hybrid random field model for scalable statistical learning

This paper introduces hybrid random fields, which are a class of probabilistic graphical models aimed at allowing for efficient structure learning in high-dimensional domains. Hybrid random fields, along with the learning algorithm we develop for them, are especially useful as a pseudo-likelihood estimation technique (rather than a technique for estimating strict joint probability distributions). In order to assess the generality of the proposed model, we prove that the class of pseudo-likelihood distributions representable by hybrid random fields strictly includes the class of joint probability distributions representable by Bayesian networks. Once we establish this result, we develop a scalable algorithm for learning the structure of hybrid random fields, which we call 'Markov Blanket Merging'. On the one hand, we characterize some complexity properties of Markov Blanket Merging both from a theoretical and from the experimental point of view, using a series of synthetic benchmarks. On the other hand, we evaluate the accuracy of hybrid random fields (as learned via Markov Blanket Merging) by comparing them to various alternative statistical models in a number of pattern classification and link-prediction applications. As the results show, learning hybrid random fields by the Markov Blanket Merging algorithm not only reduces significantly the computational cost of structure learning with respect to several considered alternatives, but it also leads to models that are highly accurate as compared to the alternative ones.

[1]  Ian Witten,et al.  Data Mining , 2000 .

[2]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[5]  Wray L. Buntine Chain graphs for learning , 1995, UAI.

[6]  R. W. Robinson Counting unlabeled acyclic digraphs , 1977 .

[7]  Jing-Yu Yang,et al.  Optimal discriminant plane for a small number of samples and design method of classifier on the plane , 1991, Pattern Recognit..

[8]  Michael J. Pazzani,et al.  Collaborative Filtering with the Simple Bayesian Classifier , 2000, PRICAI.

[9]  N. Wermuth,et al.  Graphical Models for Associations between Variables, some of which are Qualitative and some Quantitative , 1989 .

[10]  Judea Pearl,et al.  Evidential Reasoning Using Stochastic Simulation of Causal Models , 1987, Artif. Intell..

[11]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[12]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[13]  Wray L. Buntine Operations for Learning with Graphical Models , 1994, J. Artif. Intell. Res..

[14]  Thomas G. Dietterich,et al.  SRL2004 ICML 2004 Workshop on Statistical Relational Learning and its Connections to Other Fields , 2004 .

[15]  Max Welling,et al.  Bayesian Model Scoring in Markov Random Fields , 2006, NIPS.

[16]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[17]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[18]  Byoung-Tak Zhang,et al.  Construction of Large-Scale Bayesian Networks by Local to Global Search , 2002, PRICAI.

[19]  Jennifer Neville,et al.  Relational Dependency Networks , 2007, J. Mach. Learn. Res..

[20]  John P. Moussouris Gibbs and Markov random systems with constraints , 1974 .

[21]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[22]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[23]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[24]  N. Wermuth,et al.  On Substantive Research Hypotheses, Conditional Independence Graphs and Graphical Chain Models , 1990 .

[25]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[26]  David Maxwell Chickering,et al.  Dependency Networks for Inference, Collaborative Filtering, and Data Visualization , 2000, J. Mach. Learn. Res..

[27]  Vladimir Pavlovic,et al.  Protein classification using probabilistic chain graphs and the Gene Ontology structure , 2006, Bioinform..

[28]  Nir Friedman,et al.  Learning Bayesian Networks with Local Structure , 1996, UAI.

[29]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[30]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[31]  Marco Gori,et al.  Scalable statistical learning: A modular bayesian/markov network approach , 2009, 2009 International Joint Conference on Neural Networks.

[32]  María S. Pérez-Hernández,et al.  Improvement of naive Bayes collaborative filtering using interval estimation , 2003, Proceedings IEEE/WIC International Conference on Web Intelligence (WI 2003).

[33]  Sylvia Richardson,et al.  Markov Chain Monte Carlo in Practice , 1997 .

[34]  J. Laurie Snell,et al.  Markov Random Fields and Their Applications , 1980 .

[35]  Constantin F. Aliferis,et al.  Time and sample efficient discovery of Markov blankets and direct causal relations , 2003, KDD '03.

[36]  Marco Gori,et al.  ItemRank: A Random-Walk Based Scoring Algorithm for Recommender Engines , 2007, IJCAI.

[37]  David Heckerman,et al.  Empirical Analysis of Predictive Algorithms for Collaborative Filtering , 1998, UAI.

[38]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..