Latent classification models for binary data

One of the simplest, and yet most consistently well-performing set of classifiers is the naive Bayes models (a special class of Bayesian network models). However, these models rely on the (naive) assumption that all the attributes used to describe an instance are conditionally independent given the class of that instance. To relax this independence assumption, we have in previous work proposed a family of models, called latent classification models (LCMs). LCMs are defined for continuous domains and generalize the naive Bayes model by using latent variables to model class-conditional dependencies between the attributes. In addition to providing good classification accuracy, the LCM has several appealing properties, including a relatively small parameter space making it less susceptible to over-fitting. In this paper we take a first step towards generalizing LCMs to hybrid domains, by proposing an LCM for domains with binary attributes. We present algorithms for learning the proposed model, and we describe a variational approximation-based inference procedure. Finally, we empirically compare the accuracy of the proposed model to the accuracy of other classifiers for a number of different domains, including the problem of recognizing symbols in black and white images.

[1]  Bin Shen,et al.  Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers , 2002, Machine Learning.

[2]  Michael E. Tipping Probabilistic Visualisation of High-Dimensional Binary Data , 1998, NIPS.

[3]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[4]  H. Chipman,et al.  Bayesian Treed Generalized Linear Models , 2003 .

[5]  Guillaume Bouchard Efficient Bounds for the Softmax Function and Applications to Approximate Inference in Hybrid models , 2008 .

[6]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[7]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[8]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[9]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[10]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[11]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[12]  Geoffrey E. Hinton,et al.  The EM algorithm for mixtures of factor analyzers , 1996 .

[13]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[14]  Yoav Freund,et al.  The Alternating Decision Tree Learning Algorithm , 1999, ICML.

[15]  C. Bishop,et al.  Discussion of `Bayesian Treed Generalized Linear Models' by H. A. Chipman, E. I. George and R. E. McCulloch , 2002 .

[16]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[17]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[18]  Alan Bundy,et al.  Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence - IJCAI-95 , 1995 .

[19]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[20]  Jordi Vitrià,et al.  Improving Naive Bayes Using Class-Conditional ICA , 2002, IBERAMIA.

[21]  Jiří Vomlel Noisy-or classifier: Research Articles , 2006 .

[22]  Liangxiao Jiang,et al.  Hidden Naive Bayes , 2005, AAAI.

[23]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[24]  Erkki Oja,et al.  Independent Component Analysis , 2001 .

[25]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[26]  Pedro M. Domingos,et al.  Learning Bayesian network classifiers by maximizing conditional likelihood , 2004, ICML.

[28]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[29]  James V. Candy,et al.  Adaptive and Learning Systems for Signal Processing, Communications, and Control , 2006 .

[30]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[31]  Jirí Vomlel,et al.  Noisy‐or classifier , 2006, Int. J. Intell. Syst..

[32]  Stan Lipovetsky,et al.  Latent Variable Models and Factor Analysis , 2001, Technometrics.

[33]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[34]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[35]  Henry Tirri,et al.  When Discriminative Learning of Bayesian Network Parameters Is Easy , 2003, IJCAI.

[36]  Ian Witten,et al.  Data Mining , 2000 .

[37]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[38]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[39]  Kevin P. Murphy,et al.  A Variational Approximation for Bayesian Networks with Discrete and Continuous Latent Variables , 1999, UAI.

[40]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[41]  Thomas D. Nielsen,et al.  Latent Classification Models , 2005, Machine Learning.

[42]  Andrew McCallum,et al.  Dynamic conditional random fields: factorized probabilistic models for labeling and segmenting sequence data , 2004, J. Mach. Learn. Res..

[43]  Michael I. Jordan,et al.  Variational methods for inference and estimation in graphical models , 1997 .

[44]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[45]  Dorothy T. Thayer,et al.  EM algorithms for ML factor analysis , 1982 .

[46]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  S. Lauritzen Propagation of Probabilities, Means, and Variances in Mixed Graphical Association Models , 1992 .

[48]  Simon Haykin,et al.  Regularized radial basis functional networks: theory and applications , 2001 .