High dimensional microarray data classification using correlation based feature selection

Analyzing DNA microarray data pose a serious challenge because of their large number of features (genes) and relatively small number of samples. Extracting features, those have predictive capability for classifying these huge datasets demands appropriate approaches like feature reduction and identifying optimal set of genes. In this paper along with conventional statistical methods like filtering the dataset to reduce the number of features, one additional approach of evaluating correlation between the classes for each feature is performed. Proposed approach yields higher classification accuracy for both Acute Lymphoblastic (ALL) and High Grade Glioma cancer dataset than using only traditional statistical filtering methods.

[1]  Chris H. Q. Ding,et al.  Minimum Redundancy Feature Selection from Microarray Gene Expression Data , 2005, J. Bioinform. Comput. Biol..

[2]  T. Golub,et al.  Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. , 2003, Cancer research.

[3]  Ubaudi A. Franco,et al.  Microarray data mining : selecting trustworthy genes with gene feature ranking , 2009 .

[4]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[5]  Gary Hardiman Microarray Technologies – An Overview , 2002 .

[6]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[7]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[8]  Euripidis N. Loukis,et al.  Using decision tree algorithms as a basis for a heart sound diagnosis decision support system , 2003, 4th International IEEE EMBS Special Topic Conference on Information Technology Applications in Biomedicine, 2003..

[9]  Robert Gentleman,et al.  Gene Expression Profiles of B-lineage Adult Acute Lymphocytic Leukemia Reveal Genetic Patterns that Identify Lineage Derivation and Distinct Mechanisms of Transformation , 2005, Clinical Cancer Research.

[10]  Robert Tibshirani,et al.  Microarrays and Their Use in a Comparative Experiment , 2000 .

[11]  Javed Khan,et al.  Expression profiling identifies the cytoskeletal organizer ezrin and the developmental homeoprotein Six-1 as key metastatic regulators , 2004, Nature Medicine.

[12]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[13]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[14]  Mark Schena,et al.  Microarray Biochip Technology , 2000 .

[15]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[16]  Jill P. Mesirov,et al.  Support Vector Machine Classification of Microarray Data , 2001 .