A Kolmogorov-Smirnov Correlation-Based Filter for Microarray Data

A filter algorithm using F-measure has been used with feature redundancy removal based on the Kolmogorov-Smirnov (KS) test for rough equality of statistical distributions. As a result computationally efficient K-S Correlation-Based Selection algorithm has been developed and tested on three high-dimensional microarray datasets using four types of classifiers. Results are quite encouraging and several improvements are suggested.

[1]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[2]  William H. Press,et al.  Numerical recipes in C. The art of scientific computing , 1987 .

[3]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[6]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[7]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[8]  William H. Press,et al.  Numerical recipes in C , 2002 .

[9]  Wlodzislaw Duch,et al.  Filter Methods , 2006, Feature Extraction.

[10]  William H. Press,et al.  Numerical Recipes in FORTRAN - The Art of Scientific Computing, 2nd Edition , 1987 .

[11]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[12]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[13]  Rajkumar Roy,et al.  Advances in Soft Computing , 2018, Lecture Notes in Computer Science.

[14]  Wlodzislaw Duch,et al.  Feature Selection for High-Dimensional Data: A Kolmogorov-Smirnov Correlation-Based Filter , 2005, CORES.

[15]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[16]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[17]  R. E. Wheeler Statistical distributions , 1983, APLQ.

[18]  Huan Liu,et al.  Consistency Based Feature Selection , 2000, PAKDD.

[19]  M. Evans Statistical Distributions , 2000 .

[20]  Godfried T. Toussaint,et al.  Comments on 'A modified figure of merit for feature selection in pattern recognition' by Paul, J. E., Jr., et al , 1971, IEEE Trans. Inf. Theory.

[21]  Godfried T. Toussaint,et al.  Note on optimal selection of independent binary-valued features for pattern recognition (Corresp.) , 1971, IEEE Trans. Inf. Theory.