A two-stage machine learning approach for pathway analysis

Analysis of gene expression data has emerged as an important approach to discover active pathways related to biological phenotypes. Previous pathway analysis methods use all genes in a pathway for linking it to a particular phenotype. Using only a subset of informative genes, however, could better classify samples. Here, we propose a two-stage machine learning approach for pathway analysis. During the first stage, informative genes that can represent a pathway are selected using feature selection methods. These “representative genes” are mostly associated with the phenotype of interest. In the second stage, pathways are ranked based on their “representative genes” using classification methods. We applied our two-stage approach on three gene expression datasets. The results indicate our method does outperform methods that consider every gene in a pathway.

[1]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[2]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[3]  N. Normanno,et al.  Epidermal growth factor receptor (EGFR) signaling in cancer. , 2006, Gene.

[4]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[7]  Jelle J. Goeman,et al.  Testing association of a pathway with survival using gene expression data , 2005, Bioinform..

[8]  Jianhua Xuan,et al.  Enhancing Pathway Based Analysis Using Different Weighting Schemes , 2009, 2009 IEEE International Conference on Bioinformatics and Biomedicine.

[9]  S. Parmar,et al.  Interferons: mechanisms of action and clinical applications , 2003, Current opinion in oncology.

[10]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[11]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[12]  M. Xiong,et al.  Biomarker Identification by Feature Wrappers , 2022 .

[13]  U. Mansmann,et al.  Testing Differential Gene Expression in Functional Groups , 2005, Methods of Information in Medicine.

[14]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Ajay N. Jain,et al.  Pathway recognition and augmentation by computational analysis of microarray expression data , 2006, Bioinform..

[16]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[17]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[18]  P. Khatri,et al.  A systems biology approach for pathway level analysis. , 2007, Genome research.

[19]  Seon-Young Kim,et al.  Gene-set approach for expression pattern analysis , 2008, Briefings Bioinform..

[20]  Hongyu Zhao,et al.  Pathway analysis using random forests classification and regression , 2006, Bioinform..

[21]  Qi Liu,et al.  Improving gene set analysis of microarray data by SAM-GS , 2007, BMC Bioinformatics.

[22]  Leo Breiman,et al.  Technical note: Some properties of splitting criteria , 2004, Machine Learning.

[23]  J. Bergh,et al.  Identification of molecular apocrine breast tumours by microarray analysis , 2005, Breast Cancer Research.

[24]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[25]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[26]  Peter Bühlmann,et al.  Analyzing gene expression data in terms of gene sets: methodological issues , 2007, Bioinform..