Complexity of concept classes induced by discrete Markov networks and Bayesian networks

Abstract Markov networks and Bayesian networks are two popular models for classification. Vapnik–Chervonenkis dimension and Euclidean dimension are two measures of complexity of a class of functions, which can be used to measure classification capability of classifiers. One can use Vapnik–Chervonenkis dimension of the class of functions associated with a classifier to construct an estimate of its generalization error. In this paper, we study Vapnik–Chervonenkis dimension and Euclidean dimension of concept classes induced by discrete Markov networks and Bayesian networks. We show that these two dimensional values of the concept class induced by a discrete Markov network are identical, and the value equals dimension of the toric ideal corresponding to this Markov network as long as the toric ideal is nontrivial. Based on this result, one can compute the dimensional value in terms of a computer algebra system directly. Furthermore, for a general Bayesian network, we show that dimension of the corresponding toric ideal offers an upper bound of Euclidean dimension. In addition, we illustrate how to use Vapnik–Chervonenkis dimension to estimate generalization error in binary classification.

[1]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[2]  H. Wynn,et al.  Algebraic Statistics: Computational Commutative Algebra in Statistics , 2000 .

[3]  Bernd Sturmfels,et al.  Algebraic geometry of Bayesian networks , 2005, J. Symb. Comput..

[4]  Yan Wu,et al.  On the properties of concept classes induced by multivalued Bayesian networks , 2012, Inf. Sci..

[5]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[6]  D. Geiger,et al.  On the toric algebra of graphical models , 2006, math/0608054.

[7]  David A. Cox,et al.  Ideals, Varieties, and Algorithms , 1997 .

[8]  Vincent Y. F. Tan,et al.  Learning Graphical Models for Hypothesis Testing and Classification , 2010, IEEE Transactions on Signal Processing.

[9]  B. Sturmfels Gröbner bases and convex polytopes , 1995 .

[10]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[11]  Milan Studený,et al.  Probabilistic conditional independence structures , 2006, Information science and statistics.

[12]  Tzu-Tsung Wong,et al.  An efficient parameter estimation method for generalized Dirichlet priors in naïve Bayesian classifiers with multinomial models , 2016, Pattern Recognit..

[13]  Sidney K. D'Mello,et al.  A Review and Meta-Analysis of Multimodal Affect Detection Systems , 2015, ACM Comput. Surv..

[14]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[15]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[16]  Robert F. Murphy,et al.  Graphical Models for Structured Classification, with an Application to Interpreting Images of Protein Subcellular Location Patterns , 2008, J. Mach. Learn. Res..

[17]  Bogdan Savchynskyy,et al.  Discriminative Learning of Max-Sum Classifiers , 2008, J. Mach. Learn. Res..

[18]  Yan Wu,et al.  VE dimension induced by Bayesian networks over the boolean domain , 2013, Pattern Analysis and Applications.

[19]  Hans Ulrich Simon,et al.  Inner Product Spaces for Bayesian Networks , 2005, J. Mach. Learn. Res..

[20]  Pieter Abbeel,et al.  Max-margin Classification of Data with Absent Features , 2008, J. Mach. Learn. Res..

[21]  Raffaella Settimi,et al.  Geometry, moments and conditional independence trees with hidden variables , 2000 .

[22]  Sunita Sarawagi,et al.  Higher-order Graphical Models for Classification in Social and Affiliation Networks , 2010 .

[23]  Hans Schönemann,et al.  SINGULAR: a computer algebra system for polynomial computations , 2001, ACCA.

[24]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[26]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[27]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[28]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[29]  Vladimir Cherkassky,et al.  Learning from Data: Concepts, Theory, and Methods , 1998 .

[30]  Tzu-Tsung Wong,et al.  A hybrid discretization method for naïve Bayesian classifiers , 2012, Pattern Recognit..

[31]  Yan Wu,et al.  VC dimension and inner product space induced by Bayesian networks , 2009, Int. J. Approx. Reason..

[32]  Marc Boullé,et al.  Compression-Based Averaging of Selective Naive Bayes Classifiers , 2007, J. Mach. Learn. Res..

[33]  Alireza Keshavarz-Haddad,et al.  A new probabilistic classifier based on decomposable models with application to internet traffic , 2018, Pattern Recognit..

[34]  Concha Bielza,et al.  Discrete Bayesian Network Classifiers , 2014, ACM Comput. Surv..

[35]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[36]  Steffen L. Lauritzen,et al.  Graphical models in R , 1996 .

[37]  Maureen T. Carroll Geometry , 2017 .

[38]  D. Geiger,et al.  Stratified exponential families: Graphical models and model selection , 2001 .

[39]  Concha Bielza,et al.  Decision boundary for discrete Bayesian network classifiers , 2015, J. Mach. Learn. Res..

[40]  Martin A. Nowak,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004 .

[41]  Shai Ben-David,et al.  Limitations of Learning Via Embeddings in Euclidean Half Spaces , 2003, J. Mach. Learn. Res..

[42]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.