GLAD4U: deriving and prioritizing gene lists from PubMed literature

BackgroundAnswering questions such as "Which genes are related to breast cancer?" usually requires retrieving relevant publications through the PubMed search engine, reading these publications, and creating gene lists. This process is not only time-consuming, but also prone to errors.ResultsWe report GLAD4U (Gene List Automatically Derived For You), a new, free web-based gene retrieval and prioritization tool. GLAD4U takes advantage of existing resources of the NCBI to ensure computational efficiency. The quality of gene lists created by GLAD4U for three Gene Ontology (GO) terms and three disease terms was assessed using corresponding "gold standard" lists curated in public databases. For all queries, GLAD4U gene lists showed very high recall but low precision, leading to low F-measure. As a comparison, EBIMed's recall was consistently lower than GLAD4U, but its precision was higher. To present the most relevant genes at the top of a list, we studied two prioritization methods based on publication count and the hypergeometric test, and compared the ranked lists and those generated by EBIMed to the gold standards. Both GLAD4U methods outperformed EBIMed for all queries based on a variety of quality metrics. Moreover, the hypergeometric method allowed for a better performance by thresholding genes with low scores. In addition, manual examination suggests that many false-positives could be explained by the incompleteness of the gold standards. The GLAD4U user interface accepts any valid queries for PubMed, and its output page displays the ranked gene list and information associated with each gene, chronologically-ordered supporting publications, along with a summary of the run and links for file export and functional enrichment and protein interaction network analysis.ConclusionsGLAD4U has a high overall recall. Although precision is generally low, the prioritization methods successfully rank truly relevant genes at the top of the lists to facilitate efficient browsing. GLAD4U is simple to use, and its interface can be found at: http://bioinfo.vanderbilt.edu/glad4u.

[1]  Christian Blaschke,et al.  Status of text-mining techniques applied to biomedical text. , 2006, Drug discovery today.

[2]  M. Louhelainen,et al.  Resveratrol induces mitochondrial biogenesis and ameliorates Ang II-induced cardiac remodeling in transgenic rats harboring human renin and angiotensinogen genes , 2010, Blood pressure.

[3]  Gang Li,et al.  NAD(P)H quinone oxidoreductase 1 inhibits the proteasomal degradation of the tumour suppressor p33ING1b , 2008, EMBO reports.

[4]  Gang Li,et al.  The FASEB JournalResearch Communication Phosphorylation of the tumor suppressor p33 ING1b at Ser-126 influences its protein stability and proliferation of melanoma cells , 2007 .

[5]  Gang Li,et al.  The ING1b tumor suppressor facilitates nucleotide excision repair by promoting chromatin accessibility to XPA. , 2007, Experimental cell research.

[6]  P. Eriksson,et al.  Low plasma adiponectin concentration is associated with myocardial infarction in young individuals , 2010, Journal of internal medicine.

[7]  C. López-Otín,et al.  A functional link between the tumour suppressors ARF and p33ING1 , 2006, Oncogene.

[8]  P. Robinson,et al.  Walking the interactome for prioritization of candidate disease genes. , 2008, American journal of human genetics.

[9]  J. Richter,et al.  HDMX-L Is Expressed from a Functional p53-responsive Promoter in the First Intron of the HDMX Gene and Participates in an Autoregulatory Feedback Loop to Control p53 Activity* , 2010, The Journal of Biological Chemistry.

[10]  Frances S. Turner,et al.  POCUS: mining genomic sequence annotation to predict disease genes , 2003, Genome Biology.

[11]  Zheng-lai Wu,et al.  Association of the Renin Gene Polymorphism, Three Angiotensinogen Gene Polymorphisms and the Haplotypes with Essential Hypertension in the Mongolian Population , 2010, Clinical and experimental hypertension.

[12]  M. Schepens,et al.  The cancer‐related protein SSX2 interacts with the human homologue of a Ras‐like GTPase interactor, RAB3IP, and a novel nuclear protein, SSX2IP , 2002, Genes, chromosomes & cancer.

[13]  N. Campbell Genetic association database , 2004, Nature Reviews Genetics.

[14]  Carole A. Goble,et al.  A short study on the success of the Gene Ontology , 2004, J. Web Semant..

[15]  Alan R. Powell,et al.  Integration of text- and data-mining using ontologies successfully selects disease gene candidates , 2005, Nucleic acids research.

[16]  Gang Li,et al.  Phosphorylation of the tumor suppressor p33ING1b at Ser‐126 influences its protein stability and proliferation of melanoma cells , 2007, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[17]  Bing Zhang,et al.  GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies , 2004, BMC Bioinformatics.

[18]  L. Grivell Mining the bibliome: searching for a needle in a haystack? , 2002, EMBO reports.

[19]  P. Bork,et al.  Literature mining for the biologist: from information retrieval to biological discovery , 2006, Nature Reviews Genetics.

[20]  A. Stanton,et al.  Confirmation That the Renin Gene Distal Enhancer Polymorphism REN-5312C/T Is Associated With Increased Blood Pressure , 2010, Circulation. Cardiovascular genetics.

[21]  Sophia Ananiadou,et al.  FACTA: a text search engine for finding associated biomedical concepts , 2008, Bioinform..

[22]  L Hunter,et al.  MedMiner: an Internet text-mining tool for biomedical information, with application to gene expression profiling. , 1999, BioTechniques.

[23]  K. Huo,et al.  Overexpression of SCYL1‐BP1 stabilizes functional p53 by suppressing MDM2‐mediated ubiquitination , 2010, FEBS letters.

[24]  Ulf Leser,et al.  ALIBABA: PubMed as a graph , 2006, Bioinform..

[25]  Z. Radi,et al.  CELLULAR EXPRESSION OF RENAL, CARDIAC AND PULMONARY INDUCIBLE NITRIC OXIDE SYNTHASE IN DOUBLE‐TRANSGENIC MICE EXPRESSING HUMAN RENIN AND ANGIOTENSINOGEN GENES , 2009, Clinical and experimental pharmacology & physiology.

[26]  E. Francischetti,et al.  Hypoadiponectinemia Is Associated With Prehypertension in Obese Individuals of Multiethnic Origin , 2010, Clinical cardiology.

[27]  Y. Jang,et al.  Association of Plasma Retinol-Binding Protein 4, Adiponectin, and High Molecular Weight Adiponectin with Insulin Resistance in Non-Diabetic Hypertensive Patients , 2010, Yonsei medical journal.

[28]  Tiffani J. Bright,et al.  PubMatrix: a tool for multiplex literature mining , 2003, BMC Bioinformatics.

[29]  R. Mandal,et al.  Are cell cycle and apoptosis genes associated with prostate cancer risk in North Indian population? , 2012, Urologic oncology.

[30]  Arnold J Levine,et al.  A high-frequency regulatory polymorphism in the p53 pathway accelerates tumor development. , 2010, Cancer cell.

[31]  K. Riabowol,et al.  ING1 protein targeting to the nucleus by karyopherins is necessary for activation of p21. , 2008, Biochemical and biophysical research communications.

[32]  Jonathan D. Wren,et al.  Shared relationship analysis: ranking set cohesion and commonalities within a literature-derived relationship network , 2004, Bioinform..

[33]  W. Niu,et al.  Association of Renin BglI Polymphism with Essential Hypertension: A Meta-Analysis Involving 1811 Cases and 1626 Controls , 2010, Clinical and experimental hypertension.

[34]  K. Riabowol,et al.  ING1a expression increases during replicative senescence and induces a senescent phenotype , 2008, Aging cell.

[35]  P. Sham,et al.  Association of genetic variants in the adiponectin gene with adiponectin level and hypertension in Hong Kong Chinese. , 2010, European journal of endocrinology.

[36]  Bing Zhang,et al.  WebGestalt: an integrated system for exploring gene sets in various biological contexts , 2005, Nucleic Acids Res..

[37]  E. Boerwinkle,et al.  Pharmacogenetic association of hypertension candidate genes with fasting glucose in the GenHAT Study , 2010, Journal of hypertension.

[38]  J. B. Rattner,et al.  Tethering by lamin A stabilizes and targets the ING1 tumour suppressor , 2008, Nature Cell Biology.

[39]  Honghua Li,et al.  Human inhibitor of growth 1 inhibits hepatoma cell growth and influences p53 stability in a variant‐dependent manner , 2009, Hepatology.

[40]  Christoph W Sensen,et al.  Interspecies data mining to predict novel ING-protein interactions in human , 2008, BMC Genomics.

[41]  Bassem A. Hassan,et al.  Gene prioritization through genomic data fusion , 2006, Nature Biotechnology.

[42]  David S. Wishart,et al.  Nucleic Acids Research Polysearch: a Web-based Text Mining System for Extracting Relationships between Human Diseases, Genes, Mutations, Drugs Polysearch: a Web-based Text Mining System for Extracting Relationships between Human Diseases, Genes, Mutations, Drugs and Metabolites , 2008 .

[43]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[44]  D. Givol,et al.  Counteracting MDM2‐induced HIPK2 downregulation restores HIPK2/p53 apoptotic signaling in cancer cells , 2010, FEBS letters.

[45]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[46]  Bing Zhang,et al.  WebGestalt2: an updated and expanded version of the Web-based Gene Set Analysis Toolkit , 2010, BMC Bioinformatics.

[47]  A. Sabbagh,et al.  MDM2 as a modifier gene in retinoblastoma. , 2010, Journal of the National Cancer Institute.

[48]  Richard A Dart,et al.  Genetic variation in CYP27B1 is associated with congestive heart failure in patients with hypertension. , 2009, Pharmacogenomics.

[49]  C. Bai,et al.  Genetic variants of connexin37 are associated with carotid intima-medial thickness and future onset of ischemic stroke. , 2011, Atherosclerosis.

[50]  M. Ikäheimo,et al.  Plasma adiponectin levels are associated with left ventricular hypertrophy in a random sample of middle-aged subjects , 2010, Annals of medicine.

[51]  J. Rashbass Online Mendelian Inheritance in Man. , 1995, Trends in genetics : TIG.

[52]  S. Kanoni,et al.  Renin-angiotensin-aldosterone system gene polymorphisms in coronary artery bypass graft surgery patients , 2010, Journal of the renin-angiotensin-aldosterone system : JRAAS.

[53]  T. Hung,et al.  Histone H3K4me3 binding is required for the DNA repair and apoptotic activities of ING1 tumor suppressor. , 2008, Journal of molecular biology.

[54]  D. Meek,et al.  S6K1 is a multifaceted regulator of Mdm2 that connects nutrient status and DNA damage response , 2010, The EMBO journal.

[55]  Hans-Michael Müller,et al.  Textpresso: An Ontology-Based Information Retrieval and Extraction System for Biological Literature , 2004, PLoS biology.

[56]  Jing Wang,et al.  WEB-based GEne SeT AnaLysis Toolkit (WebGestalt): update 2013 , 2013, Nucleic Acids Res..

[57]  Frances S. Turner,et al.  Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes , 2006, Nucleic acids research.

[58]  Aravinda Chakravarti,et al.  Follow-up of a major linkage peak on chromosome 1 reveals suggestive QTLs associated with essential hypertension: GenNet study , 2009, European Journal of Human Genetics.

[59]  J. Ix,et al.  Mechanisms linking obesity, chronic kidney disease, and fatty liver disease: the roles of fetuin-A, adiponectin, and AMPK. , 2010, Journal of the American Society of Nephrology : JASN.

[60]  Adam D. Schuyler,et al.  SciMiner: web-based literature mining tool for target identification and functional enrichment analysis , 2009, Bioinform..

[61]  Purvesh Khatri,et al.  Onto-Tools: an ensemble of web-accessible, ontology-based tools for the functional design and interpretation of high-throughput gene expression experiments , 2004, Nucleic Acids Res..

[62]  S. Zacharieva,et al.  Adiponectin - A possible factor in the pathogenesis of carbohydrate metabolism disturbances in patients with pheochromocytoma. , 2010, Cytokine.

[63]  Dietrich Rebholz-Schuhmann,et al.  EBIMed - text crunching to gather facts for proteins from Medline , 2007, Bioinform..

[64]  Håkon K Gjessing,et al.  Maternal angiotensinogen (AGT) haplotypes, fetal renin (REN) haplotypes and risk of preeclampsia; estimation of gene-gene interaction from family-triad data , 2010, BMC Medical Genetics.

[65]  Jing Chen,et al.  Improved human disease candidate gene prioritization using mouse phenotype , 2007, BMC Bioinformatics.

[66]  P. Bork,et al.  Association of genes to genetically inherited diseases using data mining , 2002, Nature Genetics.