Integrative Gene Selection for Classification of Microarray Data

Microarray data classification is one of the major interests in health informatics that aims at discovering hidden patterns in gene expression profiles. The main challenge in building this classification system is the curse of dimensionality problem. Thus, there is a considerable amount of studies on gene selection method for building effective classification models. However, most of the approaches consider solely on gene expression values, and as a result, the selected genes might not be biologically meaningful. This paper presents an integrative gene selection for improving microarray data classification performance. The proposed approach employs the association analysis technique to integrate both gene expression and biological data in identifying informative genes. The experimental results show that the proposed gene selection outperformed the traditional method in terms of accuracy and number of selected genes.

[1]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[2]  José María Carazo,et al.  Integrated analysis of gene expression by association rules discovery , 2006, BMC Bioinformatics.

[3]  Constantin F. Aliferis,et al.  GEMS: A system for automated cancer diagnosis and biomarker discovery from microarray gene expression data , 2005, Int. J. Medical Informatics.

[4]  Jian Pei,et al.  Introduction to the special issue on data mining for health informatics , 2007, SKDD.

[5]  Fillia Makedon,et al.  HykGene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data , 2005, Bioinform..

[6]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[7]  Blaz Zupan,et al.  Towards knowledge-based gene expression data mining , 2007, J. Biomed. Informatics.

[8]  Ian Witten,et al.  Data Mining , 2000 .

[9]  Zhou-Jun Li,et al.  Are filter methods very effective in gene selection of microarray data? , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine Workshops.

[10]  Sayan Mukherjee,et al.  Molecular classification of multiple tumor types , 2001, ISMB.

[11]  Qiang Shen,et al.  Aiding classification of gene expression data with feature selection: a comparative study , 2005 .

[12]  Sayan Mukherjee,et al.  An Analytical Method for Multiclass Molecular Cancer Classification , 2003, SIAM Rev..

[13]  Chris H. Q. Ding,et al.  Evolving Feature Selection , 2005, IEEE Intell. Syst..

[14]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[15]  Li-Yeh Chuang,et al.  A Hybrid Feature Selection Method for Microarray Classification , 2022 .

[16]  Jian Tang,et al.  Integrating gene ontology into discriminative powers of genes for feature selection in microarray data , 2007, SAC '07.

[17]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[18]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[19]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[20]  Kiyoko F. Aoki-Kinoshita,et al.  From genomics to chemical genomics: new developments in KEGG , 2005, Nucleic Acids Res..

[21]  Yike Guo,et al.  Enabling more sophisticated gene expression analysis for understanding diseases and optimizing treatments , 2007, SKDD.

[22]  Jorng-Tzong Horng,et al.  An expert system to classify microarray gene expression data using gene selection by decision tree , 2009, Expert Syst. Appl..

[23]  Li M. Fu,et al.  Evaluation of gene importance in microarray data based upon probability of selection , 2005, BMC Bioinformatics.

[24]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[25]  Gregory Piatetsky-Shapiro,et al.  Microarray data mining: facing the challenges , 2003, SKDD.

[26]  Rohaizak Muhammad,et al.  Gene expression patterns distinguish breast carcinomas from normal breast tissues: the Malaysian context. , 2010, Pathology, research and practice.

[27]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Nada Lavrac,et al.  Interpreting Gene Expression Data by Searching for Enriched Gene Sets , 2007, AIME.

[30]  Tommy W. S. Chow,et al.  Identifying the biologically relevant gene categories based on gene expression and biological data: an example on prostate cancer , 2007, Bioinform..

[31]  Matthias Fellenberg,et al.  Developing integrative bioinformatics systems , 2003 .