Improving Naive Bayes Using Class-Conditional ICA

In the past years, Naive Bayes has experienced a renaissance in machine learning, particularly in the area of information retrieval. This classifier is based on the not always realistic assumption that class-conditional distributions can be factorized in the product of their marginal densities. On the other side, one of the most common ways of estimating the Independent Component Analysis (ICA) representation for a given random vector consists in minimizing the Kullback-Leibler distance between the joint density and the product of the marginal densities (mutual information). From this that ICA provides a representation where the independence assumption can be held on stronger grounds. In this paper we propose class-conditional ICA as a method that provides an adequate representation where Naive Bayes is the classifier of choice. Experiments on two public databases are performed in order to confirm this hypothesis.

[1]  Aapo Hyvärinen,et al.  New Approximations of Differential Entropy for Independent Component Analysis and Projection Pursuit , 1997, NIPS.

[2]  Terrence J. Sejnowski,et al.  ICA Mixture Models for Unsupervised Classification of Non-Gaussian Classes and Automatic Context Switching in Blind Signal Separation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  David J. Field,et al.  What Is the Goal of Sensory Coding? , 1994, Neural Computation.

[4]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[5]  David W. Scott,et al.  Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[6]  Philip H. Swain,et al.  Remote Sensing: The Quantitative Approach , 1981, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[8]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[9]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[10]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[11]  E. Oja,et al.  Independent Component Analysis , 2013 .

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  Yiming Yang,et al.  A Study of Approaches to Hypertext Categorization , 2002, Journal of Intelligent Information Systems.

[14]  Aapo Hyvärinen,et al.  Sparse Code Shrinkage: Denoising of Nongaussian Data by Maximum Likelihood Estimation , 1999, Neural Computation.

[15]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[16]  Terrence J. Sejnowski,et al.  An Information-Maximization Approach to Blind Separation and Blind Deconvolution , 1995, Neural Computation.

[17]  Jordi Vitrià,et al.  Using an ICA representation of high dimensional data for object recognition and classification , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[18]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[19]  Erkki Oja,et al.  Independent Component Analysis for Identification of Artifacts in Magnetoencephalographic Recordings , 1997, NIPS.

[20]  P. H. Swain,et al.  Two effective feature selection criteria for multispectral remote sensing , 1973 .

[21]  Thomas Marill,et al.  On the effectiveness of receptors in recognition systems , 1963, IEEE Trans. Inf. Theory.