ESTPiper – a web-based analysis pipeline for expressed sequence tags

BackgroundEST sequencing projects are increasing in scale and scope as the genome sequencing technologies migrate from core sequencing centers to individual research laboratories. Effectively, generating EST data is no longer a bottleneck for investigators. However, processing large amounts of EST data remains a non-trivial challenge for many. Web-based EST analysis tools are proving to be the most convenient option for biologists when performing their analysis, so these tools must continuously improve on their utility to keep in step with the growing needs of research communities. We have developed a web-based EST analysis pipeline called ESTPiper, which streamlines typical large-scale EST analysis components.ResultsThe intuitive web interface guides users through each step of base calling, data cleaning, assembly, genome alignment, annotation, analysis of gene ontology (GO), and microarray oligonucleotide probe design. Each step is modularized. Therefore, a user can execute them separately or together in batch mode. In addition, the user has control over the parameters used by the underlying programs. Extensive documentation of ESTPiper's functionality is embedded throughout the web site to facilitate understanding of the required input and interpretation of the computational results. The user can also download intermediate results and port files to separate programs for further analysis. In addition, our server provides a time-stamped description of the run history for reproducibility. The pipeline can also be installed locally, allowing researchers to modify ESTPiper to suit their own needs.ConclusionESTPiper streamlines the typical process of EST analysis. The pipeline was initially designed in part to support the Daphnia pulex cDNA sequencing project. A web server hosting ESTPiper is provided at http://estpiper.cgb.indiana.edu/ to now support projects of all size. The software is also freely available from the authors for local installations.

[1]  Wei Zhu,et al.  Gene structure prediction from consensus spliced alignment of multiple ESTs matching the same genomic locus , 2004, Bioinform..

[2]  Thomas L. Casavant,et al.  ESTprep: Preprocessing CDNA Sequence Reads , 2003, Bioinform..

[3]  Xiaowei Wang,et al.  Selection of Oligonucleotide Probes for Protein Coding Sequences , 2003, Bioinform..

[4]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[5]  V. Brendel,et al.  Refined Annotation of the Arabidopsis Genome by Complete Expressed Sequence Tag Mapping1 , 2003, Plant Physiology.

[6]  S. Rudd Expressed sequence tags: alternative or complement to whole genome sequences? , 2003, Trends in plant science.

[7]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[8]  A. Kerlavage,et al.  Complementary DNA sequencing: expressed sequence tags and human genome project , 1991, Science.

[9]  Hongyu Li,et al.  Construction and characterization of a rock-cluster-based EST analysis pipeline , 2006, Comput. Biol. Chem..

[10]  Anna V. Vlasova,et al.  preAssemble: a tool for automatic sequencer trace data processing , 2005, BMC Bioinformatics.

[11]  Thorsten Schmidt,et al.  OREST: the online resource for EST analysis , 2008, Nucleic Acids Res..

[12]  Namshin Kim,et al.  ECgene: genome-based EST clustering and gene modeling for alternative splicing. , 2005, Genome research.

[13]  Jan Krüger,et al.  e2g: an interactive web-based server for efficiently mapping large EST and cDNA sets to genomic sequences , 2004, Nucleic Acids Res..

[14]  Gang Wang,et al.  WebTraceMiner: a web service for processing and mining EST sequence trace files , 2007, Nucleic Acids Res..

[15]  Stephen M. Mount,et al.  Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. , 2003, Nucleic acids research.

[16]  Scott J. Emrich,et al.  PROBEmer: a web-based software tool for selecting optimal DNA oligos , 2003, Nucleic Acids Res..

[17]  M. Robles,et al.  University of Birmingham High throughput functional annotation and data mining with the Blast2GO suite , 2022 .

[18]  Robin B. Gasser,et al.  ESTExplorer: an expressed sequence tag (EST) assembly and annotation platform , 2007, Environmental health perspectives.

[19]  Peter Ernst,et al.  ESTAnnotator: a tool for high throughput EST annotation , 2003, Nucleic Acids Res..

[20]  Giorgio Valle,et al.  The Gene Ontology project in 2008 , 2007, Nucleic Acids Res..

[21]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt) , 2004, Nucleic Acids Res..

[22]  Masanori Suzuki,et al.  EGassembler: online bioinformatics service for large-scale processing, clustering and assembling ESTs and genomic DNA fragments , 2006, Nucleic Acids Res..

[23]  John J. Grefenstette,et al.  EST-PAGE - managing and analyzing EST data , 2004, Bioinform..

[24]  Midori A. Harris,et al.  The Gene Ontology project , 2005 .

[25]  Jeff Shrager,et al.  EST assembly supported by a draft genome sequence: an analysis of the Chlamydomonas reinhardtii transcriptome , 2007, Nucleic acids research.

[26]  Byungwook Lee,et al.  ESTpass: a web-based server for processing and annotating expressed sequence tag (EST) sequences , 2007, Nucleic Acids Res..

[27]  Mark L. Blaxter,et al.  PartiGene-constructing partial genomes , 2004, Bioinform..

[28]  John Quackenbush,et al.  TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets , 2003, Bioinform..

[29]  Gregory Butler,et al.  OrfPredictor: predicting protein-coding regions in EST-derived sequences , 2005, Nucleic Acids Res..

[30]  Carolyn J. Lawrence-Dill,et al.  Comparative Plant Genomics Resources at PlantGDB1 , 2005, Plant Physiology.

[31]  Nunzio D'Agostino,et al.  ParPEST: a pipeline for EST data analysis based on parallel computing , 2005, BMC Bioinformatics.

[32]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[33]  P. Ayoubi,et al.  PipeOnline 2.0: automated EST processing and functional data sorting. , 2002, Nucleic acids research.

[34]  Qunfeng Dong,et al.  Comparative EST analyses in plant systems. , 2005, Methods in enzymology.

[35]  Hui-Hsien Chou,et al.  DNA sequence quality trimming and vector removal , 2001, Bioinform..

[36]  Antonio Robles,et al.  EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration , 2008, BMC Bioinformatics.

[37]  Wei Huang,et al.  EST Pipeline System: Detailed and Automated EST Data Processing and Mining , 2003, Genomics, proteomics & bioinformatics.

[38]  Robin B. Gasser,et al.  A hitchhiker's guide to expressed sequence tag (EST) analysis , 2006, Briefings Bioinform..

[39]  Sergio Verjovski-Almeida,et al.  ESTWeb: bioinformatics services for EST sequencing projects , 2003, Bioinform..

[40]  M. Boguski,et al.  dbEST — database for “expressed sequence tags” , 1993, Nature Genetics.

[41]  Jennifer W. Weller,et al.  ESTAP-an automated system for the analysis of EST data , 2003, Bioinform..

[42]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.