JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow

BackgroundExpressed sequence tag (EST) analyses provide a rapid and economical means to identify candidate genes that may be involved in a particular biological process. These ESTs are useful in many Functional Genomics studies. However, the large quantity and complexity of the data generated during an EST sequencing project can make the analysis of this information a daunting task.ResultsIn an attempt to make this task friendlier, we have developed JUICE, an open source data management system (Apache + PHP + MySQL on Linux), which enables the user to easily upload, organize, visualize and search the different types of data generated in an EST project pipeline. In contrast to other systems, the JUICE data management system allows a branched pipeline to be established, modified and expanded, during the course of an EST project.The web interfaces and tools in JUICE enable the users to visualize the information in a graphical, user-friendly manner. The user may browse or search for sequences and/or sequence information within all the branches of the pipeline. The user can search using terms associated with the sequence name, annotation or other characteristics stored in JUICE and associated with sequences or sequence groups. Groups of sequences can be created by the user, stored in a clipboard and/or downloaded for further analyses.Different user profiles restrict the access of each user depending upon their role in the project. The user may have access exclusively to visualize sequence information, access to annotate sequences and sequence information, or administrative access.ConclusionJUICE is an open source data management system that has been developed to aid users in organizing and analyzing the large amount of data generated in an EST Project workflow. JUICE has been used in one of the first functional genomics projects in Chile, entitled "Functional Genomics in nectarines: Platform to potentiate the competitiveness of Chile in fruit exportation". However, due to its ability to organize and visualize data from external pipelines, JUICE is a flexible data management system that should be useful for other EST/Genome projects. The JUICE data management system is released under the Open Source GNU Lesser General Public License (LGPL). JUICE may be downloaded from http://genoma.unab.cl/juice_system/ or http://www.genomavegetal.cl/juice_system/.

[1]  Wei Huang,et al.  EST Pipeline System: Detailed and Automated EST Data Processing and Mining , 2003, Genomics, proteomics & bioinformatics.

[2]  M. Sorrells,et al.  Expressed sequence tag analysis in tef (Eragrostis tef (Zucc) Trotter). , 2006, Genome.

[3]  R. Baeza-Yates,et al.  A rapid and efficient method for purifying high quality total RNA from peaches (Prunus persica) for functional genomics analyses. , 2005, Biological research.

[4]  P. Green,et al.  Base-calling of automated sequencer traces using phred. I. Accuracy assessment. , 1998, Genome research.

[5]  Stephen M. Mount,et al.  The genome sequence of Drosophila melanogaster. , 2000, Science.

[6]  P Green,et al.  Base-calling of automated sequencer traces using phred. II. Error probabilities. , 1998, Genome research.

[7]  Y. Hayashizaki,et al.  Systematic expression profiling of the mouse transcriptome using RIKEN cDNA microarrays. , 2003, Genome research.

[8]  J. Burnside,et al.  Chicken genomics resource: sequencing and annotation of 35,407 ESTs from single and multiple tissue cDNA libraries and CAP3 assembly of a chicken gene index. , 2006, Physiological genomics.

[9]  Ying Lin,et al.  Mining single nucleotide polymorphisms from EST data of silkworm, Bombyx mori, inbred strain Dazao. , 2004, Insect biochemistry and molecular biology.

[10]  Jennifer W. Weller,et al.  ESTAP-an automated system for the analysis of EST data , 2003, Bioinform..

[11]  G. P. Telles,et al.  Trimming and clustering sugarcane ESTs , 2001 .

[12]  The Arabidopsis Genome Initiative Analysis of the genome sequence of the flowering plant Arabidopsis thaliana , 2000, Nature.

[13]  Peter Ernst,et al.  ESTAnnotator: a tool for high throughput EST annotation , 2003, Nucleic Acids Res..

[14]  Patrick Schweizer,et al.  Large-scale analysis of the barley transcriptome based on expressed sequence tags. , 2004, The Plant journal : for cell and molecular biology.

[15]  S. Salzberg,et al.  An optimized protocol for analysis of EST sequences. , 2000, Nucleic acids research.

[16]  Lei Liu,et al.  ESTIMA, a tool for EST management in a multi-project environment , 2004, BMC Bioinformatics.

[17]  Sergio E. Lew,et al.  Differential representation of sunflower ESTs in enriched organ-specific cDNA libraries in a small scale sequencing project , 2003, BMC Genomics.

[18]  Robert Miller,et al.  STACK: Sequence Tag Alignment and Consensus Knowledgebase , 2001, Nucleic Acids Res..

[19]  E. Park,et al.  Confirming single nucleotide polymorphisms from expressed sequence tag datasets derived from three cattle cDNA libraries. , 2006, Journal of biochemistry and molecular biology.

[20]  John J. Grefenstette,et al.  EST-PAGE - managing and analyzing EST data , 2004, Bioinform..

[21]  Sergio Verjovski-Almeida,et al.  ESTWeb: bioinformatics services for EST sequencing projects , 2003, Bioinform..

[22]  G. Robinson,et al.  Annotated expressed sequence tags and cDNA microarrays for studies of brain and behavior in the honey bee. , 2002, Genome research.

[23]  P. Robinson,et al.  Gene identification and analysis of transcripts differentially regulated in fracture healing by EST sequencing in the domestic sheep , 2006, BMC Genomics.

[24]  P. Ayoubi,et al.  PipeOnline 2.0: automated EST processing and functional data sorting. , 2002, Nucleic acids research.

[25]  Juan Antonio Vizcaíno,et al.  Generation, annotation and analysis of ESTs from Trichoderma harzianum CECT 2413 , 2006, BMC Genomics.

[26]  Mark L. Blaxter,et al.  PartiGene-constructing partial genomes , 2004, Bioinform..

[27]  André Yoshiaki Kashiwabara,et al.  EGene: a configurable pipeline generation system for automated sequence analysis , 2005, Bioinform..

[28]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[29]  P. Green,et al.  Consed: a graphical tool for sequence finishing. , 1998, Genome research.

[30]  Yecheng Huang,et al.  ESTminer: a Web interface for mining EST contig and cluster databases , 2005, Bioinform..

[31]  R. Baeza-Yates,et al.  Seasonal variation in the development of chilling injury in ‘O’Henry’ peaches , 2006 .

[32]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[33]  C. Schmidt,et al.  A double-screening method to identify reliable candidate non-synonymous SNPs from chicken EST data. , 2003, Animal genetics.

[34]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[35]  Kousaku Okubo,et al.  Large scale cDNA sequencing for analysis of quantitative and qualitative aspects of gene expression , 1992, Nature Genetics.

[36]  W. H. Lee,et al.  Heterogeneity detector: finding heterogeneous positions in Phred/Phrap assemblies , 2004, Bioinform..

[37]  李佩芳 International Rice Genome Sequencing Project. 2005. The map-based sequence of the rice genome. , 2005 .

[38]  M. Adams,et al.  The Construction of Arabidopsis Expressed Sequence Tag Assemblies (A New Resource to Facilitate Gene Identification) , 1996, Plant physiology.

[39]  Miguel Lara,et al.  Sequencing and Analysis of Common Bean ESTs. Building a Foundation for Functional Genomics1[w] , 2005, Plant Physiology.

[40]  S. Tanksley,et al.  Coffee and tomato share common gene repertoires as revealed by deep sequencing of seed and cherry transcripts , 2005, Theoretical and Applied Genetics.

[41]  Rolf Apweiler,et al.  InterProScan - an integration platform for the signature-recognition methods in InterPro , 2001, Bioinform..

[42]  Takuji Sasaki,et al.  The map-based sequence of the rice genome , 2005, Nature.