Uni- and Multi-Dimensional Clustering Via Bayesian Networks

This chapter discusses model based clustering via Bayesian networks. Both uni-dimensional and multi-dimensional clustering methods are discussed. The main idea for uni-dimensional clustering via Bayesian networks is to use the Bayesian structural clustering algorithm, which is a greedy algorithm that makes use of the EM algorithm. On the other hand, for multi-dimensional clustering we investigate latent tree models which according to our knowledge, are the only model based approach to multi-dimensional clustering. There are generally two approaches for learning latent tree models: Greedy search and feature selection. The former is able to cover a wider range of models, but the latter is more time efficient. However, latent tree models are unable to capture dependency between partitions through attributes. So we propose two approaches to overcome this shortcoming. Our first approach extends the idea of Bayesian structural clustering for uni-dimensional clustering, while the second one is a combination of feature selection methods and the main idea of multi-dimensional classification with Bayesian networks. We test our second approach on both real and synthetic data. The results show the goodness of our approach in finding meaningful and novel partitions.

[1]  J.A. Lozano,et al.  Bayesian Model Averaging of Naive Bayes for Clustering , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[2]  C. Bielza,et al.  PREDICTING THE EQ-5D FROM THE PARKINSON'S DISEASE QUESTIONNAIRE PDQ-8 USING MULTI-DIMENSIONAL BAYESIAN NETWORK CLASSIFIERS , 2014 .

[3]  David Heckerman,et al.  Knowledge Representation and Inference in Similarity Networks and Bayesian Multinets , 1996, Artif. Intell..

[4]  Gabriele Soffritti,et al.  Model-based methods to identify multiple cluster structures in a data set , 2007, Comput. Stat. Data Anal..

[5]  Adnan Darwiche,et al.  Modeling and Reasoning with Bayesian Networks , 2009 .

[6]  Yang Wang,et al.  Mutual information-based method for selecting informative feature sets , 2013, Pattern Recognit..

[7]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[8]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[9]  Bo Thiesson,et al.  Learning Mixtures of DAG Models , 1998, UAI.

[10]  Concha Bielza,et al.  Multi-dimensional classification of GABAergic interneurons with Bayesian network-modeled label uncertainty , 2014, Front. Comput. Neurosci..

[11]  Concha Bielza,et al.  Multi-label classification with Bayesian network-based chain classifiers , 2014, Pattern Recognit. Lett..

[12]  Christopher K. I. Williams,et al.  Greedy Learning of Binary Latent Trees , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Duc Truong Pham,et al.  Unsupervised training of Bayesian networks for data clustering , 2009, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[14]  Concha Bielza,et al.  Predicting human immunodeficiency virus inhibitors using multi-dimensional Bayesian network classifiers , 2013, Artif. Intell. Medicine.

[15]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[16]  Tomas Kocka,et al.  Efficient learning of hierarchical latent class models , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[17]  Concha Bielza,et al.  Bayesian Chain Classifiers for Multidimensional Classification , 2011, IJCAI.

[18]  Linda C. van der Gaag,et al.  Multi-dimensional Bayesian Network Classifiers , 2006, Probabilistic Graphical Models.

[19]  Hua Liu,et al.  A novel LTM-based method for multi-partition clustering , 2012, PGM 2012.

[20]  Pedro Larrañaga,et al.  Learning Recursive Bayesian Multinets for Data Clustering by Means of Constructive Induction , 2002, Machine Learning.

[21]  José Antonio Lozano,et al.  Multi-Objective Learning of Multi-Dimensional Bayesian Classifiers , 2008, 2008 Eighth International Conference on Hybrid Intelligent Systems.

[22]  Nevin Lianwen Zhang,et al.  Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[23]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[24]  Elchanan Mossel,et al.  Learning nonsingular phylogenies and hidden Markov models , 2005, STOC '05.

[25]  Pedro Larrañaga,et al.  Bayesian Model Averaging of TAN Models for Clustering , 2006, Probabilistic Graphical Models.

[26]  Paola Sebastiani,et al.  Learning Bayesian Networks from Incomplete Databases , 1997, UAI.

[27]  Pedro Larrañaga,et al.  An improved Bayesian structural EM algorithm for learning Bayesian networks for clustering , 2000, Pattern Recognit. Lett..

[28]  Concha Bielza,et al.  Multi-Dimensional Classification with Super-Classes , 2014, IEEE Transactions on Knowledge and Data Engineering.

[29]  Luis Enrique Sucar,et al.  A Two-Step Method to Learn Multidimensional Bayesian Network Classifiers Based on Mutual Information Measures , 2011, FLAIRS.

[30]  Linda C. van der Gaag,et al.  Inference and Learning in Multi-dimensional Bayesian Network Classifiers , 2007, ECSQARU.

[31]  Tao Chen,et al.  Latent Tree Models and Approximate Inference in Bayesian Networks , 2008, AAAI.

[32]  Tengfei Liu,et al.  Greedy learning of latent tree models for multidimensional clustering , 2013, Machine Learning.

[33]  Tengfei Liu,et al.  Model-based clustering of high-dimensional data: Variable selection versus facet determination , 2013, Int. J. Approx. Reason..

[34]  Concha Bielza,et al.  Multi-dimensional classification with Bayesian networks , 2011, Int. J. Approx. Reason..

[35]  Nir Friedman,et al.  Discovering Hidden Variables: A Structure-Based Approach , 2000, NIPS.

[36]  Concha Bielza,et al.  Markov blanket-based approach for learning multi-dimensional Bayesian network classifiers: An application to predict the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson's Disease Questionnaire (PDQ-39) , 2012, J. Biomed. Informatics.

[37]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[38]  Nir Friedman,et al.  Learning the Dimensionality of Hidden Variables , 2001, UAI.

[39]  Tao Chen,et al.  Variable Selection in Model-Based Clustering: To Do or To Facilitate , 2010, ICML.

[40]  Tengfei Liu,et al.  A Survey on Latent Tree Models and Applications , 2013, J. Artif. Intell. Res..

[41]  Pedro Larrañaga,et al.  Learning Bayesian networks for clustering by means of constructive induction , 1999, Pattern Recognit. Lett..

[42]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[43]  Tao Chen,et al.  Model-based multidimensional clustering of categorical data , 2012, Artif. Intell..