Quantitative measurement of clinic-genomic association for colorectal cancer using literature mining and Google-distance algorithm

Nowadays, a growing number of researchers devote themselves to re-excavation of existing biomedical knowledge discovery, focusing on how to establish associations between clinical and genomic data. However, quantitative analysis is still inadequate for a particular disease. Colorectal cancer is the one of malignant tumors whose molecular mechanism is relatively clear, making it a more appropriate object of study. This paper proposed a quantitative measurement of clinic-genomic associations for colorectal cancer based on Google Distance, using MEDLINE database as the corpus. Our method is engineered with several technologies, including mapping clinic and genomic data to MeSH terms, modifying Normalized Google Distance using year average. Data from Electronic Medical Records (EMR), Online Mendelian Inheritance in Man (OMIM), and Genetic Association Database (GAD) were used in this study. A total of 3795 clinic-genomic associations of colorectal cancer between 67 clinical concepts and 236 genes were obtained, of which 584 associations were identified for their gene is contained in the colorectal cancer pathway using KEGG pathway analysis. Assessment and interpretation were conducted using KEGG, GeneCards, and then getting new discoveries. This method is valid in quantitative analysis using biomedical literature and achieves a good performance in measuring the clinical data and genomic data, which can be transplanted to other disease research.

[1]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2004, Nucleic Acids Res..

[2]  Munpyo Hong,et al.  A gene ranking method using text-mining for the identification of disease related genes , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[3]  A. Jemal,et al.  Cancer statistics, 2013 , 2013, CA: a cancer journal for clinicians.

[4]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[5]  Mathieu Bastian,et al.  Gephi: An Open Source Software for Exploring and Manipulating Networks , 2009, ICWSM.

[6]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[7]  Sigrun Espelien Aasen,et al.  Medical Subject Headings - snart på norsk , 2012 .

[8]  Tsviya Olender,et al.  Database update GeneCards Version 3: the human gene integrator , 2010 .

[9]  Martha E. Williams,et al.  Annual Review of Information Science and Technology , 2008 .

[10]  Snehasis Mukhopadhyay,et al.  TransMiner: Mining Transitive Associations among Biological Objects from Text , 2004, Journal of Biomedical Science.

[11]  Alan F. Scott,et al.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders , 2004, Nucleic Acids Res..

[12]  Huilong Duan,et al.  crcTRP: A Translational Research Platform for Colorectal Cancer , 2013, Comput. Math. Methods Medicine.

[13]  Michael K. Buckland,et al.  Annual Review of Information Science and Technology , 2006, J. Documentation.

[14]  E. Montgomery,et al.  Nuclear localization of Dpc4 (Madh4, Smad4) in colorectal carcinomas and relation to mismatch repair/transforming growth factor-beta receptor defects. , 2001, The American journal of pathology.

[15]  中尾 光輝,et al.  KEGG(Kyoto Encyclopedia of Genes and Genomes)〔和文〕 (特集 ゲノム医学の現在と未来--基礎と臨床) -- (データベース) , 2000 .

[16]  Alan R. Aronson,et al.  Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program , 2001, AMIA.

[17]  C V Weller,et al.  Degenerative Changes in the Male Germinal Epithelium in Acute Alcoholism and Their Possible Relationship to Blastophthoria. , 1930, The American journal of pathology.

[18]  Geoffrey S. Ginsburg,et al.  The Personalized Medicine Coalition , 2005, American journal of pharmacogenomics : genomics-related research in drug development and clinical practice.

[19]  K. Becker,et al.  The Genetic Association Database , 2004, Nature Genetics.

[20]  Tsviya Olender,et al.  GeneCards Version 3: the human gene integrator , 2010, Database J. Biol. Databases Curation.

[21]  Paul M. B. Vitányi,et al.  The Google Similarity Distance , 2004, IEEE Transactions on Knowledge and Data Engineering.