Learning Microarray Cancer Datasets by Random Forests and Support Vector Machines

Analyzing gene expression data from microarray devices has many important applications in medicine and biology: the diagnosis of disease, accurate prognosis for particular patients, and understanding the response of a disease to drugs, to name a few. Two classifiers, random forests and support vector machines are studied in application to micro array cancer data sets. Performance of classifiers with different numbers of genes were evaluated in hope to find out if a smaller number of good genes gives a better classification rate.

[1]  D. Haussler,et al.  Knowledge-based analysis of microarray gene expression , 2000 .

[2]  R. Tibshirani,et al.  Diagnosis of multiple cancer types by shrunken centroids of gene expression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[4]  Tianzi Jiang,et al.  A combinational feature selection and ensemble neural network method for classification of gene expression data , 2004, BMC Bioinformatics.

[5]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[6]  S. Ramaswamy,et al.  Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. , 2002, Cancer research.

[7]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[8]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[10]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[11]  E. Petricoin,et al.  Use of proteomic patterns in serum to identify ovarian cancer , 2002, The Lancet.

[12]  Steve Horvath,et al.  Tumor classification by tissue microarray profiling: random forest clustering applied to renal cell carcinoma , 2005, Modern Pathology.

[13]  Lipo Wang,et al.  Cancer Classification with Microarray Data Using Support Vector Machines , 2005 .

[14]  Feng Chu,et al.  Applications of support vector machines to cancer classification with microarray data , 2005, Int. J. Neural Syst..

[15]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[16]  Mohammad Zulkernine,et al.  A hybrid network intrusion detection technique using random forests , 2006, First International Conference on Availability, Reliability and Security (ARES'06).

[17]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[18]  Xiaodong Lin,et al.  Learning a complex metabolomic dataset using random forests and support vector machines , 2004, KDD.

[19]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[20]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[21]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[22]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[23]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.