ICP: A novel approach to predict prognosis of prostate cancer with inner-class clustering of gene expression data

Prostate cancer has heterogeneous characteristics. For that reason, even if tumors appear histologically similar to each other, there are many cases in which they are actually different, based on their gene expression levels. A single tumor may have multiple expression levels with both high-risk cancer genes and low-risk cancer genes. We can produce more useful models for stratifying prostate cancers into high-risk cancer and low-risk cancer categories by considering the range in each class through inner-class clustering. In this paper, we attempt to classify cancers into high-risk (aggressive) prostate cancer and low-risk (non-aggressive) prostate cancer using ICP (Inner-class Clustering and Prediction). Our model classified more efficiently than the models of the algorithms used for comparison. After discovering a number of genes linked to prostate cancer from the gene pairs used in our classification, we discovered that the proposed method can be used to find new unknown genes and gene pairs which distinguish between high-risk cancer and low-risk cancer.

[1]  Paul Coucke,et al.  Novel MYH11 and ACTA2 mutations reveal a role for enhanced TGFβ signaling in FTAAD. , 2013, International journal of cardiology.

[2]  Qun Huo,et al.  Protein complexes/aggregates as potential cancer biomarkers revealed by a nanoparticle aggregation immunoassay. , 2010, Colloids and surfaces. B, Biointerfaces.

[3]  J. G. Liao,et al.  Logistic regression for disease classification using microarray data: model selection in a large p and small n case , 2007, Bioinform..

[4]  Gleason Df,et al.  Survival rates of patients with prostatic cancer, tumor stage, and differentiation--preliminary report. , 1966 .

[5]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[6]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[7]  Paul S. Bradley,et al.  Feature Selection via Mathematical Programming , 1997, INFORMS J. Comput..

[8]  Eibe Frank,et al.  Logistic Model Trees , 2003, ECML.

[9]  P. Carroll,et al.  20-year outcomes following conservative management of clinically localized prostate cancer , 2005 .

[10]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.

[11]  Hartwig Huland,et al.  Low Level Her2 Overexpression Is Associated with Rapid Tumor Cell Proliferation and Poor Prognosis in Prostate Cancer , 2010, Clinical Cancer Research.

[12]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[13]  Subhransu Maji,et al.  Classification using intersection kernel support vector machines is efficient , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Thomas D. Schneider,et al.  Promoter variants in the MSMB gene associated with prostate cancer regulate MSMB/NCOA4 fusion transcripts , 2012, Human Genetics.

[15]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[16]  J. Bailar,et al.  The histology and prognosis of prostatic cancer. , 1967, The Journal of urology.

[17]  Jiancheng Sun,et al.  Fast tuning of SVM kernel parameter using distance between two classes , 2008, 2008 3rd International Conference on Intelligent System and Knowledge Engineering.

[18]  Jaya M Satagopan,et al.  TMPRSS2–ERG gene fusion is associated with low Gleason scores and not with high-grade morphological features , 2010, Modern Pathology.

[19]  D. Milewicz,et al.  Genetic basis of thoracic aortic aneurysms and dissections: focus on smooth muscle cell contractile dysfunction. , 2008, Annual review of genomics and human genetics.

[20]  William H. Press,et al.  Numerical recipes in C , 2002 .

[21]  Alexander J. Smola,et al.  Support Vector Regression Machines , 1996, NIPS.

[22]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[23]  D. Gleason,et al.  Prediction of prognosis for prostatic adenocarcinoma by combined histological grading and clinical staging. , 1974, The Journal of urology.

[24]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[25]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[26]  M. Bittner,et al.  Expression profiling using cDNA microarrays , 1999, Nature Genetics.

[27]  C. Sander,et al.  Integrative genomic profiling of human prostate cancer. , 2010, Cancer cell.

[28]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[29]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[30]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[31]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[32]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[34]  C. Labrie,et al.  Androgens down-regulate myosin light chain kinase in human prostate cancer cells , 2009, The Journal of Steroid Biochemistry and Molecular Biology.

[35]  Yiqiang Chen,et al.  Building Sparse Multiple-Kernel SVM Classifiers , 2009, IEEE Transactions on Neural Networks.

[36]  Daniel Q. Naiman,et al.  Classifying Gene Expression Profiles from Pairwise mRNA Comparisons , 2004, Statistical applications in genetics and molecular biology.

[37]  B. Nagy,et al.  Overexpression of CD24, c-myc and Phospholipase 2A in Prostate Cancer Tissue Samples Obtained by Needle Biopsy , 2009, Pathology & Oncology Research.

[38]  J C Bailar,et al.  Survival rates of patients with prostatic cancer, tumor stage, and differentiation--preliminary report. , 1966, Cancer chemotherapy reports.

[39]  Youngmi Yoon,et al.  TC-VGC: A Tumor Classification System using Variations in Genes' Correlation , 2011, Comput. Methods Programs Biomed..

[40]  B. Leyland-Jones,et al.  Prostate cancer genes associated with TMPRSS2–ERG gene fusion and prognostic of biochemical recurrence in multiple cohorts , 2010, British Journal of Cancer.

[41]  Alex Arenas,et al.  Improved prognostic classification of breast cancer defined by antagonistic activation patterns of immune response pathway modules , 2010, BMC Cancer.

[42]  Tapio Visakorpi,et al.  Association of SPINK1 Expression and TMPRSS2:ERG Fusion with Prognosis in Endocrine-Treated Prostate Cancer , 2010, Clinical Cancer Research.

[43]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[44]  Anders Bjartell,et al.  Evaluation of the prognostic significance of MSMB and CRISP3 in prostate cancer using automated image analysis , 2011, Modern Pathology.

[45]  Dana Simian,et al.  A model for a complex polynomial SVM kernel , 2008 .

[46]  Li Wang,et al.  Hybrid huberized support vector machines for microarray classification and gene selection , 2008, Bioinform..

[47]  Zhongyun Dong,et al.  Secretory phospholipase A2-IIa is involved in prostate cancer progression and may potentially serve as a biomarker for prostate cancer. , 2010, Carcinogenesis.

[48]  M. Becich,et al.  Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process , 2007, BMC Cancer.

[49]  Marta E Alarcón-Riquelme,et al.  Rheumatoid arthritis in Latin Americans enriched for Amerindian ancestry is associated with loci in chromosomes 1, 12, and 13, and the HLA class II region. , 2013, Arthritis and rheumatism.

[50]  Byungkook Lee,et al.  NGEP, a gene encoding a membrane protein detected only in prostate cancer and normal prostate. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[51]  D. Gleason Classification of prostatic carcinomas. , 1966, Cancer chemotherapy reports.

[52]  Daniel Q. Naiman,et al.  Simple decision rules for classifying human cancers from gene expression profiles , 2005, Bioinform..

[53]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[54]  Paul W Dickman,et al.  Natural history of early, localized prostate cancer. , 2004, JAMA.

[55]  Vanessa M. Hayes,et al.  HSD17B4 overexpression, an independent biomarker of poor patient outcome in prostate cancer , 2009, Molecular and Cellular Endocrinology.

[56]  Matthias May,et al.  Expression of prostatic acid phosphatase (PSAP) in transurethral resection specimens of the prostate is predictive of histopathologic tumor stage in subsequent radical prostatectomies , 2009, Virchows Archiv.

[57]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[58]  Jaime Pujadas Oláno,et al.  Natural history of early localized prostate cancer. , 2004, JAMA.

[59]  Daniel Q. Naiman,et al.  Microarray Classification from Several Two-Gene Expression Comparisons , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[60]  Timo J Nevalainen,et al.  Group IIA phospholipase A2 as a prognostic marker in prostate cancer: relevance to clinicopathological variables and disease‐specific mortality , 2009, APMIS : acta pathologica, microbiologica, et immunologica Scandinavica.

[61]  James L. Gulley,et al.  New gene expressed in prostate: a potential target for T cell-mediated prostate cancer immunotherapy , 2009, Cancer Immunology, Immunotherapy.

[62]  R. Tibshirani,et al.  Gene expression profiling identifies clinically relevant subtypes of prostate cancer. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Molecular Cancer BioMed Central , 2003 .

[64]  A. Choudhury,et al.  Genome wide gene expression regulation by HIP1 Protein Interactor, HIPPI: Prediction and validation , 2011, BMC Genomics.