Pathway activity transformation for multi-class classification of lung cancer datasets

Pathway-based microarray analysis has been found to be a powerful tool to study disease mechanisms and to identify biological markers of complex diseases like lung cancer. From previous studies, the use of pathway activity transformed from gene expression data has been shown to be more informative in disease classification. However, current works on a pathway activity transformation method are for binary-class classification. In this study, we propose a pathway activity transformation method for multi-class data termed Analysis-of-Variance-based Feature Set (AFS). The classification results of using pathway activity derived from our proposed method show high classification power in three-fold cross-validation and robustness in across dataset validation for all four lung cancer datasets used.

[1]  C. Carpenter,et al.  DNA methylation analysis: a powerful new tool for lung cancer diagnosis , 2002, Oncogene.

[2]  M. Plebani,et al.  Clinical evaluation of seven tumour markers in lung cancer diagnosis: can any combination improve the results? , 1995, British Journal of Cancer.

[3]  Nicolás García-Pedrajas,et al.  Improving multiclass pattern recognition by the combination of two strategies , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Jonathan H. Chan,et al.  Pathway-Based Multi-class Classification of Lung Cancer , 2012, ICONIP.

[5]  Jonathan H. Chan,et al.  Feature selection of pathway markers for microarray-based disease classification using negatively correlated feature sets , 2011, The 2011 International Joint Conference on Neural Networks.

[6]  E. Lander,et al.  Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[8]  E. C. Hammond,et al.  Smoking and lung cancer: recent evidence and a discussion of some questions. 1959. , 2009, International journal of epidemiology.

[9]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Jerilyn A. Walker,et al.  Genetic variation among world populations: inferences from 100 Alu insertion polymorphisms. , 2003, Genome research.

[11]  C. Mountain,et al.  Revisions in the International System for Staging Lung Cancer. , 1997, Chest.

[12]  C. Mountain,et al.  Regional lymph node classification for lung cancer staging. , 1997, Chest.

[13]  Jonathan H. Chan,et al.  Pathway-based microarray analysis for robust disease classification , 2011, Neural Computing and Applications.

[14]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[15]  E. C. Hammond,et al.  Smoking and lung cancer: recent evidence and a discussion of some questions. , 1959, Journal of the National Cancer Institute.

[16]  Xuefeng Bruce Ling,et al.  Multiclass cancer classification and biomarker discovery using GA-based algorithms , 2005, Bioinform..

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[19]  Michael W Pfaffl,et al.  RNA integrity and the effect on the real-time qRT-PCR performance. , 2006, Molecular aspects of medicine.

[20]  Christopher I Amos,et al.  Genetic susceptibility to lung cancer: the role of DNA damage and repair. , 2003, Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology.

[21]  Susumu Goto,et al.  KEGG for integration and interpretation of large-scale molecular data sets , 2011, Nucleic Acids Res..

[22]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Doheon Lee,et al.  Inferring Pathway Activity toward Precise Disease Classification , 2008, PLoS Comput. Biol..