Feature selection and classification of gene expression profile in hereditary breast cancer

Correct classification and prediction of tumor cells is essential for successful diagnosis and reliable future treatment. However, it is very challenging to distinguish between tumor classes using microarray with thousands of gene expressions. Removing irrelevant genes is very helpful for us to learn the relationship between genes and tumors. In this paper we have used two methods: multivariate permutation test (MPT) and significant analysis of microarray (SAM) to select significant genes for feature selection. Using those selected features, we applied support vector machine, (SVM) with polynomial, radial and linear kernels, to predict the class of testing data. Our result shows that all the samples are classified correctly. We have achieved 100% accuracy in classification among all the samples with polynomial kernel of SVM while Liner kernel shows no misclassification among BRCA1-BRCA2 and BRCA1-sporadic.

[1]  D. Lockhart,et al.  Analysis of gene expression profiles in normal and neoplastic ovarian tissue samples identifies candidate molecular markers of epithelial ovarian cancer. , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[2]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Yoav Benjamini,et al.  Identifying differentially expressed genes using false discovery rate controlling procedures , 2003, Bioinform..

[4]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[5]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Jeffrey T. Chang,et al.  Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. , 2002, Genome research.

[7]  James L. Winkler,et al.  Accessing Genetic Information with High-Density DNA Arrays , 1996, Science.

[8]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[9]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[10]  J. Mesirov,et al.  Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Nir Friedman,et al.  Tissue classification with gene expression profiles. , 2000 .

[12]  Huiqing Liu,et al.  A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns. , 2002, Genome informatics. International Conference on Genome Informatics.

[13]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[14]  P. Brown,et al.  DNA arrays for analysis of gene expression. , 1999, Methods in enzymology.

[15]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[16]  Shenghuo Zhu,et al.  Efficient multi-way text categorization via generalized discriminant analysis , 2003, CIKM '03.

[17]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[18]  Christian A. Rees,et al.  Molecular portraits of human breast tumours , 2000, Nature.

[19]  Vladimir Pavlovic,et al.  RankGene: identification of diagnostic genes based on expression data , 2003, Bioinform..

[20]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[21]  E. Dougherty,et al.  Gene-expression profiles in hereditary breast cancer. , 2001, The New England journal of medicine.

[22]  Ramesh Ramakrishnan,et al.  A highly reproducible, linear, and automated sample preparation method for DNA microarrays. , 2002, Genome research.

[23]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[24]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[25]  Adil M. Bagirov,et al.  New algorithms for multi-class cancer diagnosis using tumor gene expression signatures , 2003, Bioinform..

[26]  Thomas G. Dietterich,et al.  Error-Correcting Output Codes: A General Method for Improving Multiclass Inductive Learning Programs , 1991, AAAI.

[27]  Isabelle Guyon,et al.  Comparison of classifier methods: a case study in handwritten digit recognition , 1994, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[28]  Richard Simon,et al.  A random variance model for detection of differential gene expression in small microarray experiments , 2003, Bioinform..

[29]  Yoonkyung Lee,et al.  Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data , 2003, Bioinform..

[30]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .

[32]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[33]  Marina Vannucci,et al.  Gene selection: a Bayesian variable selection approach , 2003, Bioinform..

[34]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[35]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[36]  Christos Sotiriou,et al.  Gene expression profiles of BRCA1-linked, BRCA2-linked, and sporadic ovarian cancers. , 2002, Journal of the National Cancer Institute.

[37]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[38]  Steven E. Bayer,et al.  A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. , 1994, Science.

[39]  Richard M. Simon,et al.  A Paradigm for Class Prediction Using Gene Expression Profiles , 2003, J. Comput. Biol..

[40]  C. Cooper,et al.  Applications of microarray technology in breast cancer research , 2001, Breast Cancer Research.

[41]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[42]  E. Lander,et al.  MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia , 2002, Nature Genetics.

[43]  M. King,et al.  BRCA1 transcriptionally regulates genes involved in breast tumorigenesis , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[44]  Sayan Mukherjee,et al.  Molecular classification of multiple tumor types , 2001, ISMB.

[45]  M. Ringnér,et al.  Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks , 2001, Nature Medicine.

[46]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[47]  S. P. Fodor,et al.  Light-directed, spatially addressable parallel chemical synthesis. , 1991, Science.

[48]  B. Williams,et al.  Identification of genes differentially regulated by interferon α, β, or γ using oligonucleotide arrays , 1998 .

[49]  Kevin Dobbin,et al.  Comparison of microarray designs for class comparison and class discovery , 2002, Bioinform..

[50]  Ulrich H.-G. Kreßel,et al.  Pairwise classification and support vector machines , 1999 .

[51]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[52]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[53]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[54]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.