Generative and Discriminative Learning by CL-Net

This correspondence presents a two-stage classification learning algorithm. The first stage approximates the class-conditional distribution of a discrete space using a separate mixture model, and the second stage investigates the class posterior probabilities by training a network. The first stage explores the generative information that is inherent in each class by using the Chow-Liu (CL) method, which approximates high-dimensional probability with a tree structure, namely, a dependence tree, whereas the second stage concentrates on discriminative learning to distinguish between classes. The resulting learning algorithm integrates the advantages of both generative learning and discriminative learning. Because it uses CL dependence-tree estimation, we call our algorithm CL-Net. Empirical tests indicate that the proposed learning algorithm makes significant improvements when compared with the related classifiers that are constructed by either generative learning or discriminative learning.

[1]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[2]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[3]  Andrew K. C. Wong,et al.  A Decision-Directed Clustering Algorithm for Discrete Data , 1977, IEEE Transactions on Computers.

[4]  JAMES C. STOFFEL,et al.  A Classifier Design Technique for Discrete Variable Pattern Recognition Problems , 1974, IEEE Transactions on Computers.

[5]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[6]  Joydeep Ghosh,et al.  An overview of radial basis function networks , 2001 .

[7]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[8]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[9]  Marina Meila,et al.  An Accelerated Chow and Liu Algorithm: Fitting Tree Distributions to High-Dimensional Sparse Data , 1999, ICML.

[10]  Alexander O. Skomorokhov Radial basis function networks in A , 2002 .

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[13]  Peter Craven,et al.  Smoothing noisy data with spline functions , 1978 .

[14]  Andrew K. C. Wong,et al.  DECA: A Discrete-Valued Data Clustering Algorithm , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  David Heckerman,et al.  Asymptotic Model Selection for Directed Networks with Hidden Variables , 1996, UAI.

[16]  J. Rissanen Stochastic Complexity and Modeling , 1986 .

[17]  David A. Bell,et al.  Learning Bayesian networks from data: An information-theory based approach , 2002, Artif. Intell..

[18]  Jie Cheng,et al.  Learning Bayesian Networks from Data: An Efficient Approach Based on Information Theory , 1999 .

[19]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[20]  Michael I. Jordan,et al.  Thin Junction Trees , 2001, NIPS.

[21]  Andrew K. C. Wong,et al.  Synthesizing Knowledge: A Cluster Analysis Approach Using Event Covering , 1986, IEEE Transactions on Systems, Man, and Cybernetics.

[22]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[23]  Kaizhu Huang,et al.  Discriminative training of Bayesian Chow-Liu multinet classifiers , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[24]  Andrew K. C. Wong,et al.  A discrete-valued clustering algorithm with applications to biomolecular data , 2001, Inf. Sci..

[25]  Trevor J. Hastie,et al.  Discriminative vs Informative Learning , 1997, KDD.

[26]  Andrew C. Wong,et al.  Classification of discrete data with feature space transformation , 1978, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[27]  Dan Geiger,et al.  An Entropy-based Learning Algorithm of Bayesian Conditional Trees , 1992, UAI.

[28]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[29]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..