Classification of serous ovarian tumors based on microarray data using multicategory support vector machines

Ovarian cancer, the most fatal of reproductive cancers, is the fifth leading cause of death in women in the United States. Serous borderline ovarian tumors (SBOTs) are considered to be earlier or less malignant forms of serous ovarian carcinomas (SOCs). SBOTs are asymptomatic and progression to advanced stages is common. Using DNA microarray technology, we designed multicategory classification models to discriminate ovarian cancer subclasses. To develop multicategory classification models with optimal parameters and features, we systematically evaluated three machine learning algorithms and three feature selection methods using five-fold cross validation and a grid search. The study included 22 subjects with normal ovarian surface epithelial cells, 12 with SBOTs, and 79 with SOCs according to microarray data with 54,675 probe sets obtained from the National Center for Biotechnology Information gene expression omnibus repository. Application of the optimal model of support vector machines one-versus-rest with signal-to-noise as a feature selection method gave an accuracy of 97.3%, relative classifier information of 0.916, and a kappa index of 0.941. In addition, 5 features, including the expression of putative biomarkers SNTN and AOX1, were selected to differentiate between normal, SBOT, and SOC groups. An accurate diagnosis of ovarian tumor subclasses by application of multicategory machine learning would be cost-effective and simple to perform, and would ensure more effective subclass-targeted therapy.

[1]  Akiko Yuba-Kubo,et al.  Sentan: a novel specific component of the apical structure of vertebrate motile cilia. , 2008, Molecular biology of the cell.

[2]  Sung-Bae Cho,et al.  Machine Learning in DNA Microarray Analysis for Cancer Classification , 2003, APBC.

[3]  S. Paik,et al.  Development of the 21-gene assay and its application in clinical practice and clinical trials. , 2008, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[4]  Emily Banks,et al.  The epidemiology of epithelial ovarian cancer: a review , 1997 .

[5]  Y-H Wu,et al.  COL11A1 promotes tumor progression and predicts poor clinical outcome in ovarian cancer , 2014, Oncogene.

[6]  Constantin F. Aliferis,et al.  A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis , 2004, Bioinform..

[7]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[8]  Yoon-La Choi,et al.  Aberrant hypermethylation of RASSF1A promoter in ovarian borderline tumors and carcinomas , 2006, Virchows Archiv.

[9]  Brigitte M. Ronnett,et al.  The Histologic Type and Stage Distribution of Ovarian Carcinomas of Surface Epithelial Origin , 2004, International journal of gynecological pathology : official journal of the International Society of Gynecological Pathologists.

[10]  Anne Cathrine Staff,et al.  ZNF385B and VEGFA Are Strongly Differentially Expressed in Serous Ovarian Carcinomas and Correlate with Survival , 2012, PloS one.

[11]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[12]  Lilya V. Matyunina,et al.  Gene expression profiling supports the hypothesis that human ovarian surface epithelia are multipotent and capable of serving as ovarian cancer initiating cells , 2009, BMC Medical Genomics.

[13]  Michael J. Birrer,et al.  The Anterior Gradient Homolog 3 (AGR3) Gene Is Associated With Differentiation and Survival in Ovarian Cancer , 2011, The American journal of surgical pathology.

[14]  Zne-Jung Lee,et al.  An integrated algorithm for gene selection and classification applied to microarray data of ovarian cancer , 2008, Artif. Intell. Medicine.