PubServer: literature searches by homology

PubServer, available at http://pubserver.burnham.org/, is a tool to automatically collect, filter and analyze publications associated with groups of homologous proteins. Protein entries in databases such as Entrez Protein database at NCBI contain information about publications associated with a given protein. The scope of these publications varies a lot: they include studies focused on biochemical functions of individual proteins, but also reports from genome sequencing projects that introduce tens of thousands of proteins. Collecting and analyzing publications related to sets of homologous proteins help in functional annotation of novel protein families and in improving annotations of well-studied protein families or individual genes. However, performing such collection and analysis manually is a tedious and time-consuming process. PubServer automatically collects identifiers of homologous proteins using PSI-Blast, retrieves literature references from corresponding database entries and filters out publications unlikely to contain useful information about individual proteins. It also prepares simple vocabulary statistics from titles, abstracts and MeSH terms to identify the most frequently occurring keywords, which may help to quickly identify common themes in these publications. The filtering criteria applied to collected publications are user-adjustable. The results of the server are presented as an interactive page that allows re-filtering and different presentations of the output.

[1]  Zhiyong Lu,et al.  PubMed and beyond: a survey of web tools for searching biomedical literature , 2011, Database J. Biol. Databases Curation.

[2]  R. Losick,et al.  The Conserved Sporulation Protein YneE Inhibits DNA Replication in Bacillus subtilis , 2009, Journal of bacteriology.

[3]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[4]  Alan F. Scott,et al.  McKusick's Online Mendelian Inheritance in Man (OMIM®) , 2008, Nucleic Acids Res..

[5]  Nancy Papalopulu,et al.  Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data , 2008, BMC Bioinformatics.

[6]  Manisha Mantri,et al.  Unified Medical Language System , 2013 .

[7]  Robert D. Finn,et al.  Structural Biology and Crystallization Communications Dufs: Families in Search of Function , 2022 .

[8]  Uwe Kärst,et al.  MineBlast: a literature presentation service supporting protein annotation by data mining of BLAST results , 2005, Bioinform..

[9]  Tsviya Olender,et al.  GeneCards Version 3: the human gene integrator , 2010, Database J. Biol. Databases Curation.

[10]  Haixu Tang,et al.  MedBlast: searching articles related to a biological sequence , 2004, Bioinform..

[11]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[12]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[13]  Jee-Hyub Kim,et al.  Database Citation in Full Text Biomedical Articles , 2013, PloS one.

[14]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[15]  Dieter Jahn,et al.  GeneReporter—sequence-based document retrieval and annotation , 2011, Bioinform..

[16]  Peter D. Karp,et al.  EcoCyc: fusing model organism databases with systems biology , 2012, Nucleic Acids Res..

[17]  Terri K. Attwood,et al.  METIS: multiple extraction techniques for informative sentences , 2005, Bioinform..

[18]  Marius Fieschi,et al.  Model Formulation: UMLS-based Conceptual Queries to Biomedical Information Databases: An Overview of the Project ARIANE , 1998, J. Am. Medical Informatics Assoc..

[19]  Gregory D. Schuler,et al.  Database resources of the National Center for Biotechnology Information: update , 2004, Nucleic acids research.

[20]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[21]  Judith A. Blake,et al.  The Mouse Genome Database: integration of and access to knowledge about the laboratory mouse , 2013, Nucleic Acids Res..

[22]  Marius Fieschi,et al.  UMLS-based conceptual queries to biomedical information databases: an overview of the project ARIANE. Unified Medical Language System. , 1998, Journal of the American Medical Informatics Association : JAMIA.

[23]  Leszek Rychlewski,et al.  FFAS03: a server for profile–profile sequence alignments , 2005, Nucleic Acids Res..

[24]  Zhiyong Lu,et al.  Improving links between literature and biological data with text mining: a case study with GEO, PDB and MEDLINE , 2012, Database J. Biol. Databases Curation.