Interaction-Based Feature Selection for Uncovering Cancer Driver Genes Through Copy Number-Driven Expression Level

Driver gene selection is crucial to understand the heterogeneous system of cancer. To identity cancer driver genes, various statistical strategies have been proposed, especially the L1-type regularization methods have drawn a large amount of attention. However, the statistical approaches have been developed purely from algorithmic and statistical point, and the existing studies have applied the statistical approaches to genomic data analysis without consideration of biological knowledge. We consider a statistical strategy incorporating biological knowledge to identify cancer driver gene. The alterations of copy number have been considered to driver cancer pathogenesis processes, and the region of strong interaction of copy number alterations and expression levels was known as a tumor-related symptom. We incorporate the influence of copy number alterations on expression levels to cancer driver gene-selection processes. To quantify the dependence of copy number alterations on expression levels, we consider [Formula: see text] and [Formula: see text] effects of copy number alterations on expression levels of genes, and incorporate the symptom of tumor pathogenesis to gene-selection procedures. We then proposed an interaction-based feature-selection strategy based on the adaptive L1-type regularization and random lasso procedures. The proposed method imposes a large amount of penalty on genes corresponding to a low dependency of the two features, thus the coefficients of the genes are estimated to be small or exactly 0. It implies that the proposed method can provide biologically relevant results in cancer driver gene selection. Monte Carlo simulations and analysis of the Cancer Genome Atlas (TCGA) data show that the proposed strategy is effective for high-dimensional genomic data analysis. Furthermore, the proposed method provides reliable and biologically relevant results for cancer driver gene selection in TCGA data analysis.

[1]  Michael Q. Zhang,et al.  Gene set-based module discovery in the breast cancer transcriptome , 2009, BMC Bioinformatics.

[2]  Zohar Yakhini,et al.  Joint Analysis of DNA Copy Numbers and Gene Expression Levels , 2004, WABI.

[3]  Michelle Ware,et al.  Novel genes upregulated when NOTCH signalling is disrupted during hypothalamic development , 2013, Neural Development.

[4]  Chandra Sekhar Pedamallu,et al.  Dynamic Epigenetic Regulation by Menin During Pancreatic Islet Tumor Formation , 2014, Molecular Cancer Research.

[5]  Vince D. Calhoun,et al.  Group sparse canonical correlation analysis for genomic data integration , 2013, BMC Bioinformatics.

[6]  Fei Gao,et al.  The role of LGR5 and ALDH1A1 in non-small cell lung cancer: Cancer progression and prognosis. , 2015, Biochemical and biophysical research communications.

[7]  Biao He,et al.  Down-Regulation of SIX3 is Associated with Clinical Outcome in Lung Adenocarcinoma , 2013, PloS one.

[8]  Satoru Miyano,et al.  Recursive Random Lasso (RRLasso) for Identifying Anti-Cancer Drug Targets , 2015, PloS one.

[9]  S. Elledge,et al.  Cumulative Haploinsufficiency and Triplosensitivity Drive Aneuploidy Patterns and Shape the Cancer Genome , 2013, Cell.

[10]  A. Pandiella,et al.  Sox2 expression in breast tumours and activation in breast cancer stem cells , 2012, Oncogene.

[11]  Ping Wang,et al.  Characterization of Somatic Mutations in Air Pollution-Related Lung Cancer , 2015, EBioMedicine.

[12]  Peter Bühlmann Regression shrinkage and selection via the Lasso: a retrospective (Robert Tibshirani): Comments on the presentation , 2011 .

[13]  Eric D Wieben,et al.  Prevalence of CDKN2A mutations in pancreatic cancer patients: implications for genetic counseling , 2011, European Journal of Human Genetics.

[14]  Florentina Bunea,et al.  ENCAPP: elastic-net-based prognosis prediction and biomarker discovery for human cancers , 2015, BMC Genomics.

[15]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[16]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[17]  Yan Zhang,et al.  Knockdown of miR-21 in human breast cancer cell lines inhibits proliferation, in vitro migration and in vivo tumor growth , 2011, Breast Cancer Research.

[18]  Jean-Marc Guinebretière,et al.  BRCA1, BRCA2, TP53, and CDKN2A germline mutations in patients with breast cancer and cutaneous melanoma , 2007, Familial Cancer.

[19]  Jeremy Nathans,et al.  Four novel mutations in the RPE65 gene in patients with Leber congenital amaurosis , 2001, Human mutation.

[20]  Carlos Caldas,et al.  A sparse regulatory network of copy-number driven expression reveals putative breast cancer oncogenes , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[21]  Avrum Spira,et al.  Derivation of a bronchial genomic classifier for lung cancer in a prospective study of patients undergoing diagnostic bronchoscopy , 2015, BMC Medical Genomics.

[22]  Hans Clevers,et al.  LGR5 positivity defines stem-like cells in colorectal cancer. , 2014, Carcinogenesis.

[23]  Ming-Feng Hou,et al.  MMP13 is potentially a new tumor marker for breast cancer diagnosis. , 2009, Oncology reports.

[24]  Hailin Tang,et al.  LGR5 Promotes Breast Cancer Progression and Maintains Stem‐Like Cells Through Activation of Wnt/β‐Catenin Signaling , 2015, Stem cells.

[25]  Takanobu Yamada,et al.  Overexpression of MMP-13 gene in colorectal cancer with liver metastasis. , 2010, Anticancer research.

[26]  Sijian Wang,et al.  RANDOM LASSO. , 2011, The annals of applied statistics.

[27]  Muhammad Hisyam Lee,et al.  Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification , 2015, Expert Syst. Appl..

[28]  H. Poulsen,et al.  Epidermal growth factor receptor (EGFR) and EGFR mutations, function and possible role in clinical trials. , 1997, Annals of oncology : official journal of the European Society for Medical Oncology.

[29]  Li Yang,et al.  Expression of the B-Cell Receptor Component CD79a on Immature Myeloid Cells Contributes to Their Tumor Promoting Effects , 2013, PloS one.

[30]  J. Wysolmerski,et al.  Key stages of mammary gland development: Molecular mechanisms involved in the formation of the embryonic mammary gland , 2005, Breast Cancer Research.

[31]  Debashis Ghosh,et al.  Classification and Selection of Biomarkers in Genomic Data Using LASSO , 2005, Journal of biomedicine & biotechnology.

[32]  Peng Wang,et al.  Sox2 suppresses the invasiveness of breast cancer cells via a mechanism that is dependent on Twist1 and the status of Sox2 transcription activity , 2013, BMC Cancer.

[33]  Christian Gieger,et al.  Common variants in P2RY11 are associated with narcolepsy , 2010, Nature Genetics.

[34]  Tatsuo Kanda,et al.  Fatty Acid Binding Protein 6 Is Overexpressed in Colorectal Cancer , 2006, Clinical Cancer Research.

[35]  V. Golubovskaya,et al.  Inhibition of hyaluronan synthase-3 decreases subcutaneous colon cancer growth by increasing apoptosis. , 2011, Anti-cancer agents in medicinal chemistry.

[36]  Charles Shapiro,et al.  Fatty acid binding protein 5 promotes metastatic potential of triple negative breast cancer cells through enhancing epidermal growth factor receptor stability , 2015, Oncotarget.

[37]  Wen Zheng,et al.  VNN1, a potential biomarker for pancreatic cancer-associated new-onset diabetes, aggravates paraneoplastic islet dysfunction by increasing oxidative stress. , 2016, Cancer letters.

[38]  Jian Gu,et al.  Genome-wide association study of survival in non-small cell lung cancer patients receiving platinum-based chemotherapy. , 2011, Journal of the National Cancer Institute.

[39]  Padmamalini Kannan-Thulasiraman,et al.  Involvement of Fatty Acid Binding Protein 5 and PPARβ/δ in Prostate Cancer Cell Growth , 2010, PPAR research.

[40]  Wendy A. Wells,et al.  Expression of “Spot 14” (THRSP) predicts disease free survival in invasive breast cancer: immunohistochemical analysis of a new molecular marker , 2006, Breast Cancer Research and Treatment.