An unsupervised approach to feature discretization and selection

Many learning problems require handling high dimensional datasets with a relatively small number of instances. Learning algorithms are thus confronted with the curse of dimensionality, and need to address it in order to be effective. Examples of these types of data include the bag-of-words representation in text classification problems and gene expression data for tumor detection/classification. Usually, among the high number of features characterizing the instances, many may be irrelevant (or even detrimental) for the learning tasks. It is thus clear that there is a need for adequate techniques for feature representation, reduction, and selection, to improve both the classification accuracy and the memory requirements. In this paper, we propose combined unsupervised feature discretization and feature selection techniques, suitable for medium and high-dimensional datasets. The experimental results on several standard datasets, with both sparse and dense features, show the efficiency of the proposed techniques as well as improvements over previous related techniques.

[1]  Mao-Hai Lin,et al.  Analysis of Color Difference in Digital Proofing Based on Color Management System , 2009 .

[2]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[3]  Wei-Pang Yang,et al.  A discretization algorithm based on Class-Attribute Contingency Coefficient , 2008, Inf. Sci..

[4]  Shu-Ching Chen,et al.  Effective supervised discretization for classification based on correlation maximization , 2011, 2011 IEEE International Conference on Information Reuse & Integration.

[5]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[6]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[7]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[8]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[9]  David G. Stork,et al.  Pattern Classification , 1973 .

[10]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[11]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[12]  Jianchu Kang,et al.  A comparative study on unsupervised feature selection methods for text clustering , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[13]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[14]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[15]  Jin-Kao Hao,et al.  A Hybrid GA/SVM Approach for Gene Selection and Classification of Microarray Data , 2006, EvoWorkshops.

[16]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[17]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[18]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[20]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[22]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[23]  Mário A. T. Figueiredo,et al.  Unsupervised feature selection for sparse data , 2011, ESANN.

[24]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[25]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[26]  Ian Witten,et al.  Data Mining , 2000 .

[27]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Yan,et al.  A formal study of feature selection in text categorization , 2009 .

[29]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[30]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[31]  Mário A. T. Figueiredo,et al.  Feature Transformation and Reduction for Text Classification , 2010, PRIS.

[32]  Verónica Bolón-Canedo,et al.  Statistical dependence measure for feature selection in microarray datasets , 2011, ESANN.

[33]  Md Nasir Sulaiman,et al.  Integrative Gene Selection for Classification of Microarray Data , 2011, Comput. Inf. Sci..

[34]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[35]  Zne-Jung Lee,et al.  An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer , 2008, Artif. Intell. Medicine.

[36]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[37]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[38]  Colas Schretter,et al.  Information-Theoretic Feature Selection in Microarray Data Using Variable Complementarity , 2008, IEEE Journal of Selected Topics in Signal Processing.

[39]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[40]  Pablo Suau,et al.  Information Theory in Computer Vision and Pattern Recognition , 2009 .

[41]  Gunnar Rätsch,et al.  The Feature Importance Ranking Measure , 2009, ECML/PKDD.

[42]  Geoffrey I. Webb,et al.  Not So Naive Bayes: Aggregating One-Dependence Estimators , 2005, Machine Learning.

[43]  Marcel J. T. Reinders,et al.  Random subspace method for multivariate feature selection , 2006, Pattern Recognit. Lett..

[44]  E. Clarke,et al.  Entropy and MDL discretization of continuous variables for Bayesian belief networks , 2000 .

[45]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[46]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[47]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.