On the Selection and Classification of Independent Features

This paper is focused on the problems of feature selection and classification when classes are modeled by statistically independent features. We show that, under the assumption of class-conditional independence, the class separability measure of divergence is greatly simplified, becoming a sum of unidimensional divergences, providing a feature selection criterion where no exhaustive search is required. Since the hypothesis of independence is infrequently met in practice, we also provide a framework making use of class-conditional Independent Component Analyzers where this assumption can be held on stronger grounds. Divergence and the Bayes decision scheme are adapted to this class-conditional representation. An algorithm that integrates the proposed representation, feature selection technique, and classifier is presented. Experiments on artificial, benchmark, and real-world data illustrate our technique and evaluate its performance.

[1]  Pierre Comon,et al.  Independent component analysis, a survey of some algebraic methods , 1996, 1996 IEEE International Symposium on Circuits and Systems. Circuits and Systems Connecting the World. ISCAS 96.

[2]  H. P. Decell,et al.  An Iterative Approach to the Feature Selection Problem , 1973 .

[3]  Kohji Fukunaga,et al.  Introduction to Statistical Pattern Recognition-Second Edition , 1990 .

[4]  Jordi Vitrià,et al.  Using an ICA representation of high dimensional data for object recognition and classification , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[5]  Harri Lappalainen,et al.  Ensemble learning for independent component analysis , 1999 .

[6]  Thomas Marill,et al.  On the effectiveness of receptors in recognition systems , 1963, IEEE Trans. Inf. Theory.

[7]  T. Kailath The Divergence and Bhattacharyya Distance Measures in Signal Selection , 1967 .

[8]  Gerard V. Trunk,et al.  A Problem of Dimensionality: A Simple Example , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  E. H. Simpson,et al.  The Interpretation of Interaction in Contingency Tables , 1951 .

[10]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[12]  Yiming Yang,et al.  A Study of Approaches to Hypertext Categorization , 2002, Journal of Intelligent Information Systems.

[13]  Aapo Hyvärinen,et al.  Sparse Code Shrinkage: Denoising of Nongaussian Data by Maximum Likelihood Estimation , 1999, Neural Computation.

[14]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[15]  Andrzej Cichocki,et al.  Flexible Independent Component Analysis , 2000, J. VLSI Signal Process..

[16]  Solomon Kullback,et al.  Information Theory and Statistics , 1970, The Mathematical Gazette.

[17]  Terrence J. Sejnowski,et al.  The “independent components” of natural scenes are edge filters , 1997, Vision Research.

[18]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[19]  A. Dawid Conditional Independence in Statistical Theory , 1979 .

[20]  Terrence J. Sejnowski,et al.  ICA Mixture Models for Unsupervised Classification of Non-Gaussian Classes and Automatic Context Switching in Blind Signal Separation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..