A personalized committee classification approach to improving prediction of breast cancer metastasis

MOTIVATION Metastasis prediction is a well-known problem in breast cancer research. As breast cancer is a complex and heterogeneous disease with many molecular subtypes, predictive models trained for one cohort often perform poorly on other cohorts, and a combined model may be suboptimal for individual patients. Furthermore, attempting to develop subtype-specific models is hindered by the ambiguity and stereotypical definitions of subtypes. RESULTS Here, we propose a personalized approach by relaxing the definition of breast cancer subtypes. We assume that each patient belongs to a distinct subtype, defined implicitly by a set of patients with similar molecular characteristics, and construct a different predictive model for each patient, using as training data, only the patients defining the subtype. To increase robustness, we also develop a committee-based prediction method by pooling together multiple personalized models. Using both intra- and inter-dataset validations, we show that our approach can significantly improve the prediction accuracy of breast cancer metastasis compared with several popular approaches, especially on those hard-to-learn cases. Furthermore, we find that breast cancer patients belonging to different canonical subtypes tend to have different predictive models and gene signatures, suggesting that metastasis in different canonical subtypes are likely governed by different molecular mechanisms. AVAILABILITY AND IMPLEMENTATION Source code implemented in MATLAB and Java available at www.cs.utsa.edu/∼jruan/PCC/.

[1]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Amy V Kapp,et al.  Discovery and validation of breast cancer subtypes , 2006, BMC Genomics.

[4]  Massimo Cristofanilli,et al.  Pre-clinical studies of Notch signaling inhibitor RO4929097 in inflammatory breast cancer cells , 2012, Breast Cancer Research and Treatment.

[5]  Md. Jamiul Jahid,et al.  A Steiner tree-based method for biomarker discovery and classification in breast cancer metastasis , 2012, BMC Genomics.

[6]  Joshua M. Stuart,et al.  Subtype and pathway specific responses to anticancer compounds in breast cancer , 2011, Proceedings of the National Academy of Sciences.

[7]  Van,et al.  A gene-expression signature as a predictor of survival in breast cancer. , 2002, The New England journal of medicine.

[8]  Ash A. Alizadeh,et al.  Gene Expression Signature of Fibroblast Serum Response Predicts Human Cancer Progression: Similarities between Tumors and Wounds , 2004, PLoS biology.

[9]  Maria Aparecida Nagai,et al.  Prognostic value of NDRG1 and SPARC protein expression in breast cancer patients , 2011, Breast Cancer Research and Treatment.

[10]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[11]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[12]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[13]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[14]  Federico Ambrogi,et al.  Challenges in projecting clustering results across gene expression-profiling datasets. , 2007, Journal of the National Cancer Institute.

[15]  Edward R. Dougherty,et al.  Identification of diagnostic subnetwork markers for cancer in human protein-protein interaction network , 2010, BMC Bioinformatics.

[16]  Nicholas J. Wang,et al.  Characterization of a naturally occurring breast cancer subset enriched in epithelial-to-mesenchymal transition and stem cell characteristics. , 2009, Cancer research.

[17]  Laura Papagno Gene Expression Signature of a Fibroblast Serum Response Predicts Cancer Progression , 2004, PLoS Biology.

[18]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[19]  Weixiong Zhang,et al.  A general co-expression network-based approach to gene expression analysis: comparison and applications , 2010, BMC Systems Biology.

[20]  Masae Tatematsu,et al.  Intestinal trefoil factor: a marker of poor prognosis in gastric carcinoma. , 2002, Clinical cancer research : an official journal of the American Association for Cancer Research.

[21]  Edi Brogi,et al.  ID genes mediate tumor reinitiation during breast cancer lung metastasis , 2007, Proceedings of the National Academy of Sciences.

[22]  Bruce H. Hasegawa,et al.  The macrophage-stimulating protein pathway promotes metastasis in a mouse model for breast cancer and predicts poor prognosis in humans , 2007, Proceedings of the National Academy of Sciences.

[23]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[24]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[25]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[26]  Jorge S. Reis-Filho,et al.  Microarray-Based Class Discovery for Molecular Classification of Breast Cancer: Analysis of Interobserver Agreement , 2011, Journal of the National Cancer Institute.

[27]  Howard Y. Chang,et al.  Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[28]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[29]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[30]  Ian H. Witten,et al.  Stacking Bagged and Dagged Models , 1997, ICML.

[31]  M. J. van de Vijver,et al.  Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. , 2006, Journal of the National Cancer Institute.

[32]  A. Hénaut,et al.  A new molecular breast cancer subclass defined from a large scale real-time quantitative RT-PCR study , 2007, BMC Cancer.

[33]  Jason I. Herschkowitz,et al.  Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer , 2010, Breast Cancer Research.

[34]  Yuan Qi,et al.  Prognostic and therapeutic implications of distinct kinase expression patterns in different subtypes of breast cancer. , 2010, Cancer research.

[35]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.