论文信息 - Order-based Discriminative Structure Learning for Bayesian Network Classifiers - 字舞流文

Order-based Discriminative Structure Learning for Bayesian Network Classifiers

We introduce a simple empirical order-based greedy heuristic for learning discriminative Bayesian network structures. We propose two metrics for establishing the ordering of N features. They are based on the conditional mutual information. Given an ordering, we can find the discriminative classifier structure with O (Nq) score evaluations (where constant q is the maximum number of parents per node). We present classification results on the UCI repository (Merz, Murphy, & Aha 1997), for a phonetic classification task using the TIMIT database (Lamel, Kassel, & Seneff 1986), and for the MNIST handwritten digit recognition task (LeCun et al. 1998). The discriminative structure found by our new procedures significantly outperforms generatively produced structures, and achieves a classification accuracy on par with the best discriminative (naive greedy) Bayesian network learning approach, but does so with a factor of ∼10 speedup. We also show that the advantages of generative discriminatively structured Bayesian network classifiers still hold in the case of missing features.

Franz Pernkopf | Jeff A. Bilmes | J. Bilmes | F. Pernkopf

[1] James R. Glass,et al. Heterogeneous acoustic measurements for phonetic classification 1 , 1997, EUROSPEECH.

[2] Franz Pernkopf,et al. Discriminative versus generative parameter and structure learning of Bayesian network classifiers , 2005, ICML.

[3] Michael I. Jordan,et al. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[4] David Heckerman,et al. Knowledge Representation and Inference in Similarity Networks and Bayesian Multinets , 1996, Artif. Intell..

[5] Daphne Koller,et al. Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks , 2005, UAI.

[6] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[7] Judea Pearl,et al. Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[8] Ron Kohavi,et al. Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[9] Usama M. Fayyad,et al. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[10] Stuart J. Russell,et al. Dynamic bayesian networks: representation, inference and learning , 2002 .

[11] Jeff A. Bilmes,et al. A Submodular-supermodular Procedure with Applications to Discriminative Structure Learning , 2005, UAI.

[12] Gregory F. Cooper,et al. A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[13] Nir Friedman,et al. Bayesian Network Classifiers , 1997, Machine Learning.

[14] Eamonn J. Keogh,et al. Learning augmented Bayesian classifiers: A comparison of distribution-based and classification-based approaches , 1999, AISTATS.

[15] Henry Tirri,et al. On Discriminative Bayesian Network Classifiers and Logistic Regression , 2005, Machine Learning.

[16] Anthony Widjaja,et al. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[17] Jeff A. Bilmes,et al. Natural statistical models for automatic speech recognition , 1999 .

[18] Pedro Larrañaga,et al. Discriminative Learning of Bayesian Network Classifiers , 2006, Inteligencia Artif..

[19] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[20] Jeff A. Bilmes,et al. Dynamic Bayesian Multinets , 2000, UAI.

[21] Nir Friedman,et al. Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[22] Derek G. Corneil,et al. Complexity of finding embeddings in a k -tree , 1987 .

[23] Sanjoy Dasgupta,et al. The Sample Complexity of Learning Fixed-Structure Bayesian Networks , 1997, Machine Learning.

[24] Pedro M. Domingos,et al. Learning Bayesian network classifiers by maximizing conditional likelihood , 2004, ICML.

[25] Alexander J. Smola,et al. Learning with kernels , 1998 .

[26] Bin Shen,et al. Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers , 2002, Machine Learning.

[27] Catherine Blake,et al. UCI Repository of machine learning databases , 1998 .

[28] Wray L. Buntine. Theory Refinement on Bayesian Networks , 1991, UAI.

[29] Christopher Meek,et al. Causal inference and causal explanation with background knowledge , 1995, UAI.