TILD: A Strategy to Identify Cancer-related Genes Using Title Information in Literature Data

After genome project in 1990s, researches which are involved with gene have been progressed. These studies unearthed that gene is cause of disease, and relations between gene and disease are important. In this reason, we proposed a strategy called TILD that identifies cancer-related genes using title information in literature data. To implement our method, we selected cancer-specific literature data from the online database. We then extracted genes using text mining. In the next step, we classified into two kinds for extracted genes using title information. If genes are located in title, then they are classified as hub genes. In the contrast, if genes are located in body, then they are classified as sub genes which are connected with hub genes. We iterated the processes for each paper to construct the cancer-specific local gene network. In the last step, we constructed global cancer-specific gene network by integrating all local gene network, and calculated a score for each gene based on analysis of the global gene network. We assumed that genes in title have meaningful relations with cancer, and other genes in the body are related with the title genes. For validation, we compared with other methods for the top 20 genes inferred by each approach. Our approach found more cancer-related genes than comparable methods.

[1]  Liu Hong,et al.  High expression of epidermal growth factor receptor might predict poor survival in patients with colon cancer: a meta-analysis. , 2013, Genetic testing and molecular biomarkers.

[2]  Ping Yu,et al.  The 677C>T (rs1801133) Polymorphism in the MTHFR Gene Contributes to Colorectal Cancer Risk: A Meta-Analysis Based on 71 Research Studies , 2013, PloS one.

[3]  Sang Kyun Sohn,et al.  MGMT −535G>T polymorphism is associated with prognosis for patients with metastatic colorectal cancer treated with oxaliplatin-based chemotherapy , 2010, Journal of Cancer Research and Clinical Oncology.

[4]  Roded Sharan,et al.  Associating Genes and Protein Complexes with Disease via Network Propagation , 2010, PLoS Comput. Biol..

[5]  Li Zhu,et al.  Overexpression of epithelial growth factor receptor (EGFR) predicts better response to neo-adjuvant chemotherapy in patients with triple-negative breast cancer , 2012, Journal of Translational Medicine.

[6]  Di Wu,et al.  miRCancer: a microRNA-cancer association database constructed by text mining on literature , 2013, Bioinform..

[7]  Daniela Pinto,et al.  Epidermal growth factor genetic variation, breast cancer risk, and waiting time to onset of disease. , 2009, DNA and cell biology.

[8]  D. Swanson Fish Oil, Raynaud's Syndrome, and Undiscovered Public Knowledge , 2015, Perspectives in biology and medicine.

[9]  Massimo Loda,et al.  Silencing of CDX2 Expression in Colon Cancer via a Dominant Repression Pathway* , 2003, Journal of Biological Chemistry.

[10]  Masayuki Watanabe,et al.  CD44v6 expression is related to mesenchymal phenotype and poor prognosis in patients with colorectal cancer. , 2013, Oncology reports.

[11]  Chuan Liu,et al.  The CHEK2 I157T variant and colorectal cancer susceptibility: a systematic review and meta-analysis. , 2012, Asian Pacific journal of cancer prevention : APJCP.

[12]  Jung-Hsien Chiang,et al.  GIS: a biomedical text-mining system for gene information discovery , 2004, Bioinform..

[13]  David Jou,et al.  Ursolic acid inhibits the growth of colon cancer-initiating cells by targeting STAT3. , 2013, Anticancer research.

[14]  Sushma Agrawal,et al.  CD44 Gene Polymorphisms in Breast Cancer Risk and Prognosis: A Study in North Indian Population , 2013, PloS one.

[15]  Roded Sharan,et al.  PRINCIPLE: a tool for associating genes with diseases via network propagation , 2011, Bioinform..

[16]  Barbara Burwinkel,et al.  CD24 polymorphisms in breast cancer: impact on prognosis and risk , 2013, Breast Cancer Research and Treatment.

[17]  T. Josifovski,et al.  Promoter length polymorphism in UGT1A1 and the risk of sporadic colorectal cancer. , 2012, Cancer genetics.

[18]  Shao Li,et al.  Constructing biological networks through combined literature mining and microarray analysis: a LMMA approach , 2006, Bioinform..

[19]  Hyeong-Seok Lim,et al.  Association between CYP2D6 genotypes and the clinical outcomes of adjuvant tamoxifen for breast cancer: a meta-analysis. , 2014, Pharmacogenomics.

[20]  P. Jagodziński,et al.  Murine Double Minute Clone 2 309T/G and 285G/C Promoter Single Nucleotide Polymorphism as a Risk Factor for Breast Cancer: A Polish Experience , 2012, The International journal of biological markers.

[21]  Doheon Lee,et al.  Discovering context-specific relationships from biological literature by using multi-level context terms , 2012, BMC Medical Informatics and Decision Making.