SeqHound: biological sequence and structure database as a platform for bioinformatics research

BackgroundSeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment.ResultsSeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries.ConclusionsThe system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit.

[1]  D. Higgins,et al.  See Blockindiscussions, Blockinstats, Blockinand Blockinauthor Blockinprofiles Blockinfor Blockinthis Blockinpublication Clustal: Blockina Blockinpackage Blockinfor Blockinperforming Multiple Blockinsequence Blockinalignment Blockinon Blockina Minicomputer Article Blockin Blockinin Blockin , 2022 .

[2]  Peer Bork,et al.  Recent improvements to the SMART domain-based sequence annotation resource , 2002, Nucleic Acids Res..

[3]  J. M. Ostell,et al.  THE NCBL DATA MODEL , 1998 .

[4]  N. O. Manning,et al.  The protein data bank , 1999, Genetica.

[5]  Christopher W. V. Hogue,et al.  NBLAST: a cluster variant of BLAST for NxN comparisons , 2002, BMC Bioinformatics.

[6]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[7]  Ian M. Donaldson,et al.  BIND: the Biomolecular Interaction Network Database , 2001, Nucleic Acids Res..

[8]  Gary D. Bader,et al.  BIND-a data specification for storing and describing biomolecular interactions, molecular complexes and pathways , 2000, Bioinform..

[9]  Katerina Michalickova,et al.  Sequence Analysis on a 216-Processor Beowulf Cluster , 2000, Annual Linux Showcase & Conference.

[10]  Evelyn Camon,et al.  The EMBL Nucleotide Sequence Database , 2000, Nucleic Acids Res..

[11]  Jian Zhang,et al.  The Protein Information Resource: an integrated public resource of functional annotation of proteins , 2002, Nucleic Acids Res..

[12]  John B. Anderson,et al.  MMDB: Entrez's 3D-structure database , 2002, Nucleic Acids Res..

[13]  J. Blake,et al.  Creating the Gene Ontology Resource : Design and Implementation The Gene Ontology Consortium 2 , 2001 .

[14]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[15]  Benjamin A. Shoemaker,et al.  CDD: a database of conserved domain alignments with links to domain three-dimensional structure , 2002, Nucleic Acids Res..

[16]  J M Ostell,et al.  The NCBI data model. , 2001, Methods of biochemical analysis.

[17]  G. Schuler,et al.  Entrez: molecular biology database and retrieval system. , 1996, Methods in enzymology.

[18]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[19]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[20]  Yanli Wang,et al.  MMDB: Entrez's 3D-structure database , 2003, Nucleic Acids Res..

[21]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[22]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[23]  S. Chung,et al.  Kleisli: a new tool for data integration in biology. , 1999, Trends in biotechnology.

[24]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[25]  Christopher W. V. Hogue,et al.  Kangaroo – A pattern-matching program for biological sequences , 2002, BMC Bioinformatics.