Identification of Genes Involved in Breast Cancer Metastasis by Integrating Protein-Protein Interaction Information with Expression Data

The selection of relevant genes for breast cancer metastasis is critical for the treatment and prognosis of cancer patients. Although much effort has been devoted to the gene selection procedures by use of different statistical analysis methods or computational techniques, the interpretation of the variables in the resulting survival models has been limited so far. This article proposes a new Random Forest (RF)-based algorithm to identify important variables highly related with breast cancer metastasis, which is based on the important scores of two variable selection algorithms, including the mean decrease Gini (MDG) criteria of Random Forest and the GeneRank algorithm with protein-protein interaction (PPI) information. The new gene selection algorithm can be called PPIRF. The improved prediction accuracy fully illustrated the reliability and high interpretability of gene list selected by the PPIRF approach.

[1]  Sudhir Kumar,et al.  CD44: A key player in breast cancer. , 2014, Indian journal of cancer.

[2]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[3]  T. Ideker,et al.  Network-based classification of breast cancer metastasis , 2007, Molecular systems biology.

[4]  Desmond J. Higham,et al.  GeneRank: Using search engine technology for the analysis of microarray experiments , 2005, BMC Bioinformatics.

[5]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[6]  Weixiong Zhang,et al.  A general co-expression network-based approach to gene expression analysis: comparison and applications , 2010, BMC Systems Biology.

[7]  M. Berger,et al.  Capturing intra-tumor genetic heterogeneity by de novo mutation profiling of circulating cell-free tumor DNA: a proof-of-principle. , 2014, Annals of oncology : official journal of the European Society for Medical Oncology.

[8]  Tanja Fehm,et al.  OPG and PgR show similar cohort specific effects as prognostic factors in ER positive breast cancer , 2014, Molecular oncology.

[9]  Bjoern H. Menze,et al.  A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data , 2009, BMC Bioinformatics.

[10]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[11]  Ghislain Bidaut,et al.  Interactome-transcriptome integration for predicting distant metastasis in breast cancer , 2012, Bioinform..

[12]  Peng Liu,et al.  Diallyl Disulfide Suppresses SRC/Ras/ERK Signaling-Mediated Proliferation and Metastasis in Human Breast Cancer by Up-Regulating miR-34a , 2014, PloS one.

[13]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[14]  Achim Zeileis,et al.  Bias in random forest variable importance measures: Illustrations, sources and a solution , 2007, BMC Bioinformatics.

[15]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[16]  Holger Fröhlich,et al.  Integration of pathway knowledge into a reweighted recursive feature elimination approach for risk stratification of cancer patients , 2010, Bioinform..

[17]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[18]  Yudong D. He,et al.  Gene expression profiling predicts clinical outcome of breast cancer , 2002, Nature.

[19]  Yin Liu,et al.  Incorporating prior knowledge into Gene Network Study , 2013, Bioinform..

[20]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[21]  Robert E. Mansel,et al.  [Metastasis of breast cancer]. , 1956, La Revue du praticien.

[22]  A. Martínez-Torteya,et al.  SurvExpress: An Online Biomarker Validation Tool and Database for Cancer Gene Expression Data Using Survival Analysis , 2013, PloS one.

[23]  William J. Gradishar ER-Positive Breast Cancer Remains a Long-Term Concern , 2017 .

[24]  Dennis B. Troup,et al.  NCBI GEO: mining tens of millions of expression profiles—database and tools update , 2006, Nucleic Acids Res..

[25]  Van,et al.  A gene-expression signature as a predictor of survival in breast cancer. , 2002, The New England journal of medicine.

[26]  Tian Zheng,et al.  Interaction-based feature selection and classification for high-dimensional biological data , 2012, Bioinform..

[27]  Harald Binder,et al.  Incorporating pathway information into boosting estimation of high-dimensional risk prediction models , 2009, BMC Bioinformatics.

[28]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[29]  Laura Austin,et al.  TP53 mutations detected in circulating tumor cells present in the blood of metastatic triple negative breast cancer patients , 2014, Breast Cancer Research.

[30]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[31]  M. Deem,et al.  Hierarchy of gene expression data is predictive of future breast cancer outcome , 2013, Physical biology.

[32]  C. Perou,et al.  Molecular portraits and 70-gene prognosis signature are preserved throughout the metastatic process of breast cancer. , 2005, Cancer research.

[33]  J. Palazzo,et al.  TP 53 mutations detected in circulating tumor cells present in the blood of metastatic triple negative breast cancer patients , 2017 .