Binary Feature Selection with Conditional Mutual Information

In a context of classi cation, we propose to use conditional mutual information to select a family of binary features which are individually discriminating and weakly dependent. We show that on a task of image classi cation, despite its simplicity, a naive Bayesian classi er based on features selected with this Conditional Mutual Information Maximization (CMIM) criterion performs as well as a classi er built with AdaBoost. We also show that this classi cation method is more robust than boosting when trained on a noisy data set. Key-words: classi cation, feature selection, Bayesian classi er, mutual information Sélection de descripteurs par maximisation de l'information mutuelle conditionnelle Résumé : Dans un contexte de classi cation, nous proposons d'utiliser l'information mutuelle conditionnelle pour sélectionner une famille de descripteurs binaires qui sont individuellement informatifs tout en étant faiblement dépendants entre eux. Nous montrons sur un problème de classi cation d'images que malgré sa simplicité un classi eur de type Bayésien naïf utilisant des descripteurs sélectionnés de cette manière obtient des taux d'erreur similaires à ceux d'un classi eur construit à l'aide d'AdaBoost. Nous montrons également que cette technique est beaucoup plus robuste que le boosting dans un cadre bruité. Mots-clés : classi cation, sélection de features, classi eur bayésien, information mutuelle CMIM feature selection 3

[1]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[2]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[5]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[6]  P. Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[7]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[8]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[9]  EstimationBrian V. BonnlanderDepartment Selecting Input Variables Using Mutual Informationand Nonparametric Density , 1996 .

[10]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[11]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[12]  Gunnar Rätsch,et al.  Regularizing AdaBoost , 1998, NIPS.

[13]  Peter L. Bartlett,et al.  Boosting Algorithms as Gradient Descent , 1999, NIPS.

[14]  L. Breiman SOME INFINITY THEORY FOR PREDICTOR ENSEMBLES , 2000 .

[15]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[16]  Michael J. Pazzani,et al.  Collaborative Filtering with the Simple Bayesian Classifier , 2000, PRICAI.

[17]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[18]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[19]  Tong Zhang,et al.  An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods , 2001, AI Mag..

[20]  Donald Geman,et al.  Fast face detection with precise pose estimation , 2002, Object recognition supported by user interaction for service robots.

[21]  Kari Torkkola,et al.  Feature Extraction by Non-Parametric Mutual Information Maximization , 2003, J. Mach. Learn. Res..

[22]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..