PartiGene-constructing partial genomes

UNLABELLED Expressed sequence tags (ESTs) offer a low-cost approach to gene discovery and are being used by an increasing number of laboratories to obtain sequence information for a wide variety of organisms. The challenge lies in processing and organizing this data within a genomic context to facilitate large scale analyses. Here we present PartiGene, an integrated sequence analysis suite that uses freely available public domain software to (1) process raw trace chromatograms into sequence objects suitable for submission to dbEST; (2) place these sequences within a genomic context; (3) perform customizable first-pass annotation of the data; and (4) present the data as HTML tables and an SQL database resource. PartiGene has been used to create a number of non-model organism database resources including NEMBASE (http://www.nematodes.org) and LumbriBase (http://www.earthworms.org/). The packages are readily portable, freely available and can be run on simple Linux-based workstations. AVAILABILITY PartiGene is available from http://www.nematodes.org/PartiGene and also forms part of the EST analysis software, associated with the Natural Environmental Research Council (UK) Bio-Linux project (http://envgen.nox.ac.uk/biolinux.html).

[1]  John Parkinson,et al.  The Brugia malayi genome project: expressed sequence tags and gene discovery. , 2002, Transactions of the Royal Society of Tropical Medicine and Hygiene.

[2]  J. Daub,et al.  A survey of genes expressed in adults of the human hookworm, Necator americanus , 2000, Parasitology.

[3]  Sergio Verjovski-Almeida,et al.  ESTWeb: bioinformatics services for EST sequencing projects , 2003, Bioinform..

[4]  Mark L. Blaxter,et al.  Making sense of EST sequences by CLOBBing them , 2002, BMC Bioinformatics.

[5]  Robert Miller,et al.  STACK: Sequence Tag Alignment and Consensus Knowledgebase , 2001, Nucleic Acids Res..

[6]  John Quackenbush,et al.  TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets , 2003, Bioinform..

[7]  Y. Hayashizaki,et al.  Amino acid translation program for full-length cDNA sequences with frameshift errors. , 2001, Physiological genomics.

[8]  Gregory D. Schuler,et al.  ESTablishing a human transcript map , 1995, Nature Genetics.

[9]  O. White,et al.  TDB: new databases for biological discovery. , 1996, Methods in enzymology.

[10]  Maria Jesus Martin,et al.  The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003 , 2003, Nucleic Acids Res..

[11]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[12]  C. V. Jongeneel,et al.  ESTScan: A Program for Detecting, Evaluating, and Reconstructing Potential Coding Regions in EST Sequences , 1999, ISMB.

[13]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[14]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[15]  J. Parkinson,et al.  Expressed sequence tag survey of gene expression in the scab mite Psoroptes ovis – allergens, proteases and free-radical scavengers , 2003, Parasitology.

[16]  J. Parkinson,et al.  400000 nematode ESTs on the Net. , 2003, Trends in parasitology.

[17]  T. Unnasch The filarial genome project , 1994 .

[18]  R. Fleischmann,et al.  Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. , 1995, Nature.

[19]  J. Daub,et al.  Analysis of Genes Expressed at the Infective Larval Stage Validates Utility of Litomosoides sigmodontis as a Murine Model for Filarial Vaccine Development , 2000, Infection and Immunity.

[20]  Owen White,et al.  TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects , 1995 .