Model Matching: A Novel Framework to use Clustering Strategy to Solve the Classification Problem

It is a common practice to handle labeled data with classifiers and unlabeled ones with clusterings. The traditional Bayesian network classifiers (BNC<inline-formula> <tex-math notation="LaTeX">$^{\mathcal {T}}\text{s}$ </tex-math></inline-formula>) learned from labeled training set <inline-formula> <tex-math notation="LaTeX">$\mathcal {T}$ </tex-math></inline-formula> directly map the unlabeled test instance into the network structure to calculate the conditional probability for the classification, which neglects the information hidden in the unlabeled data and will result in classification bias. To address this issue, we propose a novel learning framework, called model matching, that uses the “clustering” strategy to solve the classification problem. The labeled data is divided into several clusters according to the different class label to learn a set of BNC<inline-formula> <tex-math notation="LaTeX">$^{\mathcal {T}}\text{s}$ </tex-math></inline-formula> and a corresponding set of BNC<inline-formula> <tex-math notation="LaTeX">$^{p}\text{s}$ </tex-math></inline-formula> is built for each unlabeled test instance. To make a classification, the cross entropy method is applied to compare the structural similarity between BNC<inline-formula> <tex-math notation="LaTeX">$^{\mathcal {T}}$ </tex-math></inline-formula> and BNC<sup><italic>p</italic></sup>. The extensive experimental results on 46 datasets from the University of California at Irvine (UCI) machine learning repository demonstrate that for BNCs model matching helps improve the generalization performance and outperforms the several state-of-the-art classifiers like tree-augmented naive Bayes and Random forest.

[1]  Geoffrey I. Webb,et al.  Alleviating naive Bayes attribute independence assumption by attribute weighting , 2013, J. Mach. Learn. Res..

[2]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[3]  Dinggang Shen,et al.  Learning Discriminative Bayesian Networks from High-Dimensional Continuous Neuroimaging Data , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Enrico Fagiuoli,et al.  Tree-Based Credal Networks for Classification , 2003, Reliab. Comput..

[5]  Tommy W. S. Chow,et al.  ML-TREE: A Tree-Structure-Based Approach to Multilabel Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Mehran Sahami,et al.  Learning Limited Dependence Bayesian Classifiers , 1996, KDD.

[7]  Limin Wang,et al.  K-Dependence Bayesian Classifier Ensemble , 2017, Entropy.

[8]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[9]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[10]  Mark A. Hall,et al.  A decision tree-based attribute weighting filter for naive Bayes , 2006, Knowl. Based Syst..

[11]  Michael G. Madden,et al.  On the classification performance of TAN and general Bayesian networks , 2008, Knowl. Based Syst..

[12]  Jennifer G. Dy,et al.  Quantifying Uncertainty in Discrete-Continuous and Skewed Data with Bayesian Deep Learning , 2018, KDD.

[13]  R. E. Lee,et al.  Distribution-free multiple comparisons between successive treatments , 1995 .

[14]  Harry Zhang,et al.  Full Bayesian network classifiers , 2006, ICML.

[15]  Ryan P. Browne,et al.  Model-Based Learning Using a Mixture of Mixtures of Gaussian and Uniform Distributions , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Geoffrey I. Webb,et al.  Subsumption resolution: an efficient and effective technique for semi-naive Bayesian learning , 2012, Machine Learning.

[17]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[18]  Kewei Tu,et al.  Learning Bayesian network structures under incremental construction curricula , 2017, Neurocomputing.

[19]  Limin Wang,et al.  Learning a Flexible K-Dependence Bayesian Classifier from the Chain Rule of Joint Probability Distribution , 2015, Entropy.

[20]  Henry Tirri,et al.  On Discriminative Bayesian Network Classifiers and Logistic Regression , 2005, Machine Learning.

[21]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[22]  Keki B. Irani,et al.  Multi-interval discretization of continuos attributes as pre-processing for classi cation learning , 1993, IJCAI 1993.

[23]  Hong-Bo Shi,et al.  Tree-augmented naive Bayes ensembles , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[24]  Duc Truong Pham,et al.  Building Bayesian network classifiers through a Bayesian complexity monitoring system , 2009 .

[25]  Sebastian Tschiatschek,et al.  Maximum Margin Bayesian Network Classifiers , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  D. M. Titterington,et al.  Comment on “On Discriminative vs. Generative Classifiers: A Comparison of Logistic Regression and Naive Bayes” , 2008, Neural Processing Letters.

[27]  Liangxiao Jiang,et al.  Improving Tree augmented Naive Bayes for class probability estimation , 2012, Knowl. Based Syst..

[28]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[29]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[30]  Johannes Fürnkranz,et al.  Efficient implementation of class-based decomposition schemes for Naïve Bayes , 2013, Machine Learning.

[31]  Yann LeCun,et al.  Scene parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers , 2012, ICML.

[32]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[33]  Geoffrey I. Webb,et al.  Robust Bayesian Kernel Machine via Stein Variational Gradient Descent for Big Data , 2018, KDD.

[34]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[35]  Shasha Wang,et al.  Structure extended multinomial naive Bayes , 2016, Inf. Sci..

[36]  Geoffrey I. Webb,et al.  Selective AnDE for large data learning: a low-bias memory constrained approach , 2017, Knowledge and Information Systems.

[37]  Franz Pernkopf,et al.  Stochastic margin-based structure learning of Bayesian network classifiers , 2013, Pattern Recognit..

[38]  Geoffrey I. Webb,et al.  Sample-Based Attribute Selective A$n$ DE for Large Data , 2017, IEEE Transactions on Knowledge and Data Engineering.

[39]  P. Hadjicostas Consistency of logistic regression coefficient estimates calculated from a training sample , 2003 .

[40]  Ivor W. Tsang,et al.  Online Heterogeneous Transfer by Hedge Ensemble of Offline and Online Decisions , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[41]  Concha Bielza,et al.  Decision boundary for discrete Bayesian network classifiers , 2015, J. Mach. Learn. Res..

[42]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[43]  Concha Bielza,et al.  Discrete Bayesian Network Classifiers , 2014, ACM Comput. Surv..

[44]  Qingyao Wu,et al.  Online Transfer Learning with Multiple Homogeneous or Heterogeneous Sources , 2017, IEEE Transactions on Knowledge and Data Engineering.

[45]  Jan Gorodkin,et al.  Comparing two K-category assignments by a K-category correlation coefficient , 2004, Comput. Biol. Chem..

[46]  Paul D. McNicholas,et al.  Model-Based Clustering , 2016, Journal of Classification.

[47]  Nizar Bouguila,et al.  Variational Bayesian inference for infinite generalized inverted Dirichlet mixtures with feature selection and its application to clustering , 2015, Applied Intelligence.

[48]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[49]  Chih-Fong Tsai,et al.  Clustering-based undersampling in class-imbalanced data , 2017, Inf. Sci..

[50]  Yuhong Yang Elements of Information Theory (2nd ed.). Thomas M. Cover and Joy A. Thomas , 2008 .

[51]  Geoffrey I. Webb,et al.  Scalable Learning of Bayesian Network Classifiers , 2016, J. Mach. Learn. Res..

[52]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[53]  Yingjie Tian,et al.  A Comprehensive Survey of Clustering Algorithms , 2015, Annals of Data Science.

[54]  Wee Keong Ng,et al.  A survey on data stream clustering and classification , 2015, Knowledge and Information Systems.

[55]  Zhihua Cai,et al.  Attribute Weighting via Differential Evolution Algorithm for Attribute Weighted Naive Bayes (WNB) , 2011 .

[56]  James Cussens,et al.  Integer Linear Programming for the Bayesian network structure learning problem , 2017, Artif. Intell..