Inner Product Spaces for Bayesian Networks

Bayesian networks have become one of the major models used for statistical inference. We study the question whether the decisions computed by a Bayesian network can be represented within a low-dimensional inner product space. We focus on two-label classification tasks over the Boolean domain. As main results we establish upper and lower bounds on the dimension of the inner product space for Bayesian networks with an explicitly given (full or reduced) parameter collection. In particular, these bounds are tight up to a factor of 2. For some nontrivial cases of Bayesian networks we even determine the exact values of this dimension. We further consider logistic autoregressive Bayesian networks and show that every sufficiently expressive inner product space must have dimension at least Ω(n2), where n is the number of network nodes. We also derive the bound 2Ω(n) for an artificial variant of this network, thereby demonstrating the limits of our approach and raising an interesting open question. As a major technical contribution, this work reveals combinatorial and algebraic structures within Bayesian networks such that known methods for the derivation of lower bounds on the dimension of inner product spaces can be brought into play.

[1]  D. J. Spiegelhalter,et al.  Statistical and Knowledge‐Based Approaches to Clinical Decision‐Support Systems, with an Application in Gastroenterology , 1984 .

[2]  Pavel Pudlák,et al.  Threshold circuits of bounded depth , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[3]  John Shawe-Taylor,et al.  String Kernels, Fisher Kernels and Finite State Automata , 2002, NIPS.

[4]  Smola,et al.  Natural Regularization from Generative Models , 2000 .

[5]  S. V. N. Vishwanathan,et al.  Leaving the Span , 2005, COLT.

[6]  R. Dudley Central Limit Theorems for Empirical Measures , 1978 .

[7]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[8]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[9]  Hans Ulrich Simon,et al.  On the smallest possible dimension and the largest possible margin of linear arrangements representing given concept classes , 2006, Theor. Comput. Sci..

[10]  Satyanarayana V. Lokam,et al.  Relations Between Communication Complexity, Linear Arrangements, and Computational Complexity , 2001, FSTTCS.

[11]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[12]  Judea Pearl,et al.  Reverend Bayes on Inference Engines: A Distributed Hierarchical Approach , 1982, AAAI.

[13]  Michael I. Jordan,et al.  Mean Field Theory for Sigmoid Belief Networks , 1996, J. Artif. Intell. Res..

[14]  Motoaki Kawanabe,et al.  Asymptotic Properties of the Fisher Kernel , 2004, Neural Computation.

[15]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[16]  Hans Ulrich Simon,et al.  Bayesian Networks and Inner Product Spaces , 2004, COLT.

[17]  Santosh S. Vempala,et al.  An algorithmic theory of learning: Robust concepts and random projection , 1999, Machine Learning.

[18]  Eike Kiltz,et al.  On the Representation of Boolean Predicates of the Diffie-Hellman Function , 2003, STACS.

[19]  James V. Candy,et al.  Adaptive and Learning Systems for Signal Processing, Communications, and Control , 2006 .

[20]  Thomas Hofmann,et al.  Hidden Markov Support Vector Machines , 2003, ICML.

[21]  Eike Kiltz,et al.  Complexity Theoretic Aspects of Some Cryptographic Functions , 2003, COCOON.

[22]  Hans Ulrich Simon,et al.  Estimating the Optimal Margins of Embeddings in Euclidean Half Spaces , 2004, Machine Learning.

[23]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[24]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[25]  David Maxwell Chickering,et al.  A Bayesian Approach to Learning Bayesian Networks with Local Structure , 1997, UAI.

[26]  Gunnar Rätsch,et al.  A New Discriminative Kernel from Probabilistic Models , 2001, Neural Computation.

[27]  Santosh S. Vempala,et al.  On Kernels, Margins, and Low-Dimensional Mappings , 2004, ALT.

[28]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[29]  Peter Frankl,et al.  The Johnson-Lindenstrauss lemma and the sphericity of some graphs , 1987, J. Comb. Theory, Ser. B.

[30]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[31]  Brendan J. Frey,et al.  Graphical Models for Machine Learning and Digital Communication , 1998 .

[32]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[33]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[34]  Shai Ben-David,et al.  Limitations of Learning Via Embeddings in Euclidean Half Spaces , 2003, J. Mach. Learn. Res..

[35]  Motoaki Kawanabe,et al.  The Leave-One-Out Kernel , 2002, ICANN.

[36]  Michael Schmitt,et al.  On the Complexity of Computing and Learning with Multiplicative Neural Networks , 2002, Neural Computation.

[37]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[38]  Jürgen Forster A linear lower bound on the unbounded error probabilistic communication complexity , 2002, J. Comput. Syst. Sci..