Identifying Liver Cancer and Its Relations with Diseases, Drugs, and Genes: A Literature-Based Approach

In biomedicine, scientific literature is a valuable source for knowledge discovery. Mining knowledge from textual data has become an ever important task as the volume of scientific literature is growing unprecedentedly. In this paper, we propose a framework for examining a certain disease based on existing information provided by scientific literature. Disease-related entities that include diseases, drugs, and genes are systematically extracted and analyzed using a three-level network-based approach. A paper-entity network and an entity co-occurrence network (macro-level) are explored and used to construct six entity specific networks (meso-level). Important diseases, drugs, and genes as well as salient entity relations (micro-level) are identified from these networks. Results obtained from the literature-based literature mining can serve to assist clinical applications.

[1]  Rascon [The National Cancer Institute]. , 1953, Boletin cultural e informativo - Consejo General de Colegios Medicos de Espana.

[2]  C. Dolea,et al.  World Health Organization , 1949, International Organization.

[3]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[4]  L. Brooke The National Library of Medicine. , 1980, Hospital libraries.

[5]  Calcium folinate , 1992 .

[6]  B J Stapley,et al.  Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[7]  Javed Mostafa,et al.  Detecting Gene Relations from MEDLINE Abstracts , 2000, Pacific Symposium on Biocomputing.

[8]  P. Johnston,et al.  5-Fluorouracil: mechanisms of action and clinical strategies , 2003, Nature Reviews Cancer.

[9]  Mark R. Gilder,et al.  Extraction of protein interaction information from unstructured text using a context-free grammar , 2003, Bioinform..

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Josepa Ribes,et al.  Primary liver cancer: worldwide incidence and trends. , 2004, Gastroenterology.

[12]  Russ B Altman,et al.  Extracting and characterizing gene-drug relationships from the literature. , 2004, Pharmacogenetics.

[13]  T. Pieler,et al.  Successful treatment of hepatocellular carcinoma with the tyrosine kinase inhibitor imatinib in a patient with liver cirrhosis , 2004, Anti-cancer drugs.

[14]  Kent A. Spackman,et al.  Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts , 2005, BMC Bioinformatics.

[15]  William R. Hersh,et al.  A Survey of Current Work in Biomedical Text Mining , 2005 .

[16]  Hisham Al-Mubaid,et al.  A New Text Mining Approach for Finding Protein-to-Disease Associations , 2005 .

[17]  Teruyoshi Hishiki,et al.  Extraction of Gene-Disease Relations from Medline Using Domain Dictionaries and Machine Learning , 2005, Pacific Symposium on Biocomputing.

[18]  Bart Nooteboom,et al.  Network Embeddedness and the Exploration of Novel Technologies: Technological Distance, Betweenness Centrality and Density , 2006 .

[19]  K. Bretonnel Cohen,et al.  Frontiers of biomedical text mining: current progress , 2007, Briefings Bioinform..

[20]  Dieter Söll,et al.  Natural expansion of the genetic code. , 2007, Nature chemical biology.

[21]  Hsinchun Chen,et al.  Global mapping of gene/protein interactions in PubMed abstracts: A framework and an experiment with P53 interactions , 2007, J. Biomed. Informatics.

[22]  Yunhao Liu,et al.  Proceedings of the 17th international conference on World Wide Web , 2008, WWW 2008.

[23]  Dragomir R. Radev,et al.  Identifying gene-disease associations using centrality on a literature mined gene-interaction network , 2008, ISMB.

[24]  Jean-Loup Guillaume,et al.  Fast unfolding of communities in large networks , 2008, 0803.0476.

[25]  Shumeet Baluja,et al.  Pagerank for product image search , 2008, WWW.

[26]  C. Kwoh,et al.  From Biomedical Literature to Knowledge: Mining Protein-Protein Interactions , 2008, Computational Intelligence in Biomedicine and Bioinformatics.

[27]  Lucy Vanderwende,et al.  Exploring Content Models for Multi-Document Summarization , 2009, NAACL.

[28]  Hung Huynh,et al.  Tyrosine kinase inhibitors to treat liver cancer , 2010, Expert opinion on emerging drugs.

[29]  Ali Tizghadam,et al.  Betweenness centrality and resistance distance in communication networks , 2010, IEEE Network.

[30]  J. Qiu,et al.  Finding Complex Biological Relationships in Recent PubMed Articles Using Bio-LDA , 2011, PloS one.

[31]  K. Ghoshal,et al.  Stat3‐mediated activation of microRNA‐23a suppresses gluconeogenesis in hepatocellular carcinoma by down‐regulating Glucose‐6‐phosphatase and peroxisome proliferator‐activated receptor gamma, coactivator 1 alpha , 2012, Hepatology.

[32]  Min Song,et al.  Detecting the knowledge structure of bioinformatics by mining full-text collections , 2012, Scientometrics.

[33]  Dina Demner-Fushman,et al.  Biomedical Text Mining: A Survey of Recent Progress , 2012, Mining Text Data.

[34]  Charu C. Aggarwal,et al.  Mining Text Data , 2012 .

[35]  Curtis Huttenhower,et al.  Microbial Co-occurrence Relationships in the Human Microbiome , 2012, PLoS Comput. Biol..

[36]  Hua Xu,et al.  Ranking Gene-Drug Relationships in Biomedical Literature Using Latent Dirichlet Allocation , 2011, Pacific Symposium on Biocomputing.

[37]  Ying Ding,et al.  Discovering Implicit Entity Relation with the Gene-Citation-Gene Network , 2013, PloS one.

[38]  Francisco M. Couto,et al.  Enhancement of Chemical Entity Identification in Text Using Semantic Similarity Validation , 2013, PloS one.

[39]  Min Song,et al.  Entitymetrics: Measuring the Impact of Entities , 2013, PloS one.

[40]  Zhiyong Lu,et al.  PubTator: a web-based text mining tool for assisting biocuration , 2013, Nucleic Acids Res..

[41]  Rong Xu,et al.  A semi-supervised approach to extract pharmacogenomics-specific drug-gene pairs from biomedical literature for personalized medicine , 2013, J. Biomed. Informatics.

[42]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[43]  Changqin Quan,et al.  Gene-disease association extraction by text mining and network analysis , 2014, Louhi@EACL.

[44]  Erjia Yan,et al.  Dynamic subfield analysis of disciplines: an examination of the trading impact and knowledge diffusion patterns of computer science , 2015, Scientometrics.

[45]  M. Lawless,et al.  Liver Cancer (Hepatocellular Carcinoma) , 2015 .

[46]  Thierry Hamon,et al.  Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis , 2015 .

[47]  Erjia Yan,et al.  Identifying entities from scientific publications: A comparison of vocabulary- and model-based methods , 2015, J. Informetrics.

[48]  U. Berkeley Exploring Content Models for Multi-Document Summarization , 2018 .