Unsupervised training of Bayesian networks for data clustering

This paper presents a new approach to the unsupervised training of Bayesian network classifiers. Three models have been analysed: the Chow and Liu (CL) multinets; the tree-augmented naive Bayes; and a new model called the simple Bayesian network classifier, which is more robust in its structure learning. To perform the unsupervised training of these models, the classification maximum likelihood criterion is used. The maximization of this criterion is derived for each model under the classification expectation–maximization (EM) algorithm framework. To test the proposed unsupervised training approach, 10 well-known benchmark datasets have been used to measure their clustering performance. Also, for comparison, the results for the k-means and the EM algorithm, as well as those obtained when the three Bayesian network classifiers are trained in a supervised way, are analysed. A real-world image processing application is also presented, dealing with clustering of wood board images described by 165 attributes. Results show that the proposed learning method, in general, outperforms traditional clustering algorithms and, in the wood board image application, the CL multinets obtained a 12 per cent increase, on average, in clustering accuracy when compared with the k-means method and a 7 per cent increase, on average, when compared with the EM algorithm.

[1]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[2]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[3]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[4]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[5]  Duc Truong Pham,et al.  Clustering techniques and their applications in engineering , 2007 .

[6]  Pedro Larrañaga,et al.  Learning Bayesian networks for clustering by means of constructive induction , 1999, Pattern Recognit. Lett..

[7]  Claudio A. Perez,et al.  A neurofuzzy color image segmentation method for wood surface defect detection , 2005 .

[8]  Nir Friedman,et al.  Context-Specific Bayesian Clustering for Gene Expression Data , 2002, J. Comput. Biol..

[9]  Gérard Govaert,et al.  Gaussian parsimonious clustering models , 1995, Pattern Recognit..

[10]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[11]  James C. Bezdek,et al.  Some new indexes of cluster validity , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[12]  Pedro M. Domingos,et al.  Learning Bayesian network classifiers by maximizing conditional likelihood , 2004, ICML.

[13]  Stephen J. Roberts,et al.  Maximum certainty data partitioning , 2000, Pattern Recognit..

[14]  Gregory F. Cooper,et al.  Exact model averaging with naive Bayesian classifiers , 2002, ICML.

[15]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[16]  J.A. Lozano,et al.  Bayesian Model Averaging of Naive Bayes for Clustering , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[17]  Patrick K. Simpson,et al.  Fuzzy min-max neural networks - Part 2: Clustering , 1993, IEEE Trans. Fuzzy Syst..

[18]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[19]  Gennady Agre,et al.  On Supervised and Unsupervised Discretization , 2007 .

[20]  Vladimir Pavlovic,et al.  Boosted Bayesian network classifiers , 2008, Machine Learning.

[21]  Duc Truong Pham,et al.  Smart Inspection Systems: Techniques and Applications of Intelligent Vision , 2002 .

[22]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[23]  Eibe Frank,et al.  Unsupervised Discretization Using Tree-Based Density Estimation , 2005, PKDD.

[24]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[25]  Luis M. de Campos,et al.  Learning Bayesian Network Classifiers: Searching in a Space of Partially Directed Acyclic Graphs , 2005, Machine Learning.

[26]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[27]  Ramón López de Mántaras,et al.  TAN Classifiers Based on Decomposable Distributions , 2005, Machine Learning.

[28]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[29]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[30]  Byoung-Tak Zhang,et al.  Bayesian model averaging of Bayesian network classifiers over multiple node-orders: application to sparse datasets , 2005, IEEE Trans. Syst. Man Cybern. Part B.

[31]  Stephen Grossberg,et al.  A massively parallel architecture for a self-organizing neural pattern recognition machine , 1988, Comput. Vis. Graph. Image Process..

[32]  Pedro Larrañaga,et al.  Unsupervised Learning Of Bayesian Networks Via Estimation Of Distribution Algorithms: An Application To Gene Expression Data Clustering , 2004, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[33]  Stefano Ferilli,et al.  Unsupervised Discretization Using Kernel Density Estimation , 2007, IJCAI.

[34]  Pa Estevez,et al.  Genetic input selection to a neural classifier for defect classification of radiata pine boards , 2003 .

[35]  Pablo A. Estévez,et al.  Automated visual inspection system for wood defect classification using computational intelligence techniques , 2009, Int. J. Syst. Sci..

[36]  Boaz Lerner,et al.  Bayesian Class-Matched Multinet Classifier , 2006, SSPR/SPR.

[37]  Tao Wang,et al.  Generalized Additive Bayesian Network Classifiers , 2007, IJCAI.

[38]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[39]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[40]  Pedro Larrañaga,et al.  Bayesian Model Averaging of TAN Models for Clustering , 2006, Probabilistic Graphical Models.

[41]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[42]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[43]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[44]  P. Schönemann On artificial intelligence , 1985, Behavioral and Brain Sciences.

[45]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[46]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[47]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .

[48]  Pedro Larrañaga,et al.  An improved Bayesian structural EM algorithm for learning Bayesian networks for clustering , 2000, Pattern Recognit. Lett..

[49]  Heather J. Ruskin,et al.  Techniques for clustering gene expression data , 2008, Comput. Biol. Medicine.