EuroPineDB: a high-coverage web database for maritime pine transcriptome

BackgroundPinus pinaster is an economically and ecologically important species that is becoming a woody gymnosperm model. Its enormous genome size makes whole-genome sequencing approaches are hard to apply. Therefore, the expressed portion of the genome has to be characterised and the results and annotations have to be stored in dedicated databases.DescriptionEuroPineDB is the largest sequence collection available for a single pine species, Pinus pinaster (maritime pine), since it comprises 951 641 raw sequence reads obtained from non-normalised cDNA libraries and high-throughput sequencing from adult (xylem, phloem, roots, stem, needles, cones, strobili) and embryonic (germinated embryos, buds, callus) maritime pine tissues. Using open-source tools, sequences were optimally pre-processed, assembled, and extensively annotated (GO, EC and KEGG terms, descriptions, SNPs, SSRs, ORFs and InterPro codes). As a result, a 10.5× P. pinaster genome was covered and assembled in 55 322 UniGenes. A total of 32 919 (59.5%) of P. pinaster UniGenes were annotated with at least one description, revealing at least 18 466 different genes. The complete database, which is designed to be scalable, maintainable, and expandable, is freely available at: http://www.scbi.uma.es/pindb/. It can be retrieved by gene libraries, pine species, annotations, UniGenes and microarrays (i.e., the sequences are distributed in two-colour microarrays; this is the only conifer database that provides this information) and will be periodically updated. Small assemblies can be viewed using a dedicated visualisation tool that connects them with SNPs. Any sequence or annotation set shown on-screen can be downloaded. Retrieval mechanisms for sequences and gene annotations are provided.ConclusionsThe EuroPineDB with its integrated information can be used to reveal new knowledge, offers an easy-to-use collection of information to directly support experimental work (including microarray hybridisation), and provides deeper knowledge on the maritime pine transcriptome.

[1]  P. Arús,et al.  MELOGEN: an EST database for melon functional genomics , 2007, BMC Genomics.

[2]  Suppression subtractive hybridisation: application in the discovery of novel pharmacological targets. , 2000, Pharmacogenomics.

[3]  M. Gonzalo Claros,et al.  A Web Tool to Discover Full-Length Sequences - Full-Lengther , 2008, Innovations in Hybrid Intelligent Systems.

[4]  R. Sederoff,et al.  Microarray Analyses of Gene Expression during Adventitious Root Development in Pinus contorta1[w] , 2004, Plant Physiology.

[5]  S. Salzberg,et al.  An optimized protocol for analysis of EST sequences. , 2000, Nucleic acids research.

[6]  Byungwook Lee,et al.  CleanEST: a database of cleansed EST libraries , 2008, Nucleic Acids Res..

[7]  R. Sederoff,et al.  Analysis of xylem formation in pine by cDNA sequencing. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[8]  James J. Johnson,et al.  ForestTreeDB: a database dedicated to the mining of tree transcriptomes , 2006, Nucleic Acids Res..

[9]  T. Wetter,et al.  Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. , 2004, Genome research.

[10]  Joël Fillon,et al.  TreeSNPs: a laboratory information management system (LIMS) dedicated to SNP discovery in trees , 2010, Tree Genetics & Genomes.

[11]  Jarmila Nahalkova,et al.  Comparative analysis of transcript abundance in Pinus sylvestris after challenge with a saprotrophic, pathogenic or mutualistic fungus. , 2008, Tree physiology.

[12]  R. Sederoff,et al.  Apparent homology of expressed genes from wood-forming tissues of loblolly pine (Pinus taeda L.) with Arabidopsis thaliana , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[14]  B. Ziegenhagen,et al.  Evolution of Genome Size in Conifers , 2005 .

[15]  M. Robles,et al.  University of Birmingham High throughput functional annotation and data mining with the Blast2GO suite , 2022 .

[16]  Jill L. Wegrzyn,et al.  PineSAP—sequence alignment and SNP identification pipeline , 2009, Bioinform..

[17]  Alex Bateman,et al.  The InterPro database, an integrated documentation resource for protein families, domains and functional sites , 2001, Nucleic Acids Res..

[18]  J. Harrow,et al.  Identifying protein-coding genes in genomic sequences , 2009, Genome Biology.

[19]  M. Gonzalo Claros,et al.  AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences , 2010, Algorithms for Molecular Biology.

[20]  Francisco R. Cantón,et al.  Identification of genes differentially expressed during adventitious shoot induction in Pinus pinea cotyledons by subtractive hybridization and quantitative PCR. , 2007, Tree physiology.

[21]  E. Mardis Next-generation DNA sequencing methods. , 2008, Annual review of genomics and human genetics.

[22]  Johan A. Grahnen,et al.  Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery , 2010, BMC Genomics.

[23]  J. Paiva,et al.  Plasticity of maritime pine (Pinus pinaster) wood-forming tissues during a growing season. , 2008, The New phytologist.

[24]  M. Gonzalo Claros,et al.  SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read , 2010, BMC Bioinformatics.

[25]  Gregory Kucherov,et al.  mreps: efficient and flexible detection of tandem repeats in DNA , 2003, Nucleic Acids Res..

[26]  Processing the Loblolly Pine PtGen2 cDNA Microarray , 2009, Journal of visualized experiments : JoVE.

[27]  J. MacKay,et al.  Automated SNP detection from a large collection of white spruce expressed sequences: contributing factors and approaches for the categorization of SNPs , 2006, BMC Genomics.

[28]  J. Gion,et al.  The proteome of maritime pine wood forming tissue , 2005, Proteomics.

[29]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[30]  Sarah Barber,et al.  A conifer genomics resource of 200,000 spruce (Picea spp.) ESTs and 6,464 high-quality, sequence-finished full-length cDNAs for Sitka spruce (Picea sitchensis) , 2008, BMC Genomics.

[31]  James A. Cuff,et al.  Distinguishing protein-coding and noncoding genes in the human genome , 2007, Proceedings of the National Academy of Sciences.

[32]  Jill L. Wegrzyn,et al.  TreeGenes: A Forest Tree Genome Database , 2008, International journal of plant genomics.

[33]  R. Sederoff,et al.  Transcriptional analysis of Pinus sylvestris roots challenged with the ectomycorrhizal fungus Laccaria bicolor , 2008, BMC Plant Biology.

[34]  Juan M. Corchado,et al.  Innovations in Hybrid Intelligent Systems , 2008, Advances in Soft Computing.

[35]  Jade Buchanan-Carter,et al.  Sequencing and de novo analysis of a coral larval transcriptome using 454 GSFlx , 2009, BMC Genomics.

[36]  John Quackenbush,et al.  The TIGR Gene Indices: clustering and assembling EST and known genes and integration with eukaryotic genomes , 2004, Nucleic Acids Res..

[37]  P. Wincker,et al.  A new genomic resource dedicated to wood formation in Eucalyptus , 2009, BMC Plant Biology.

[38]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[39]  Gang Wang,et al.  ConiferEST: an integrated bioinformatics system for data reprocessing and mining of conifer expressed sequence tags (ESTs) , 2007, BMC Genomics.

[40]  Robert Miller,et al.  STACK: Sequence Tag Alignment and Consensus Knowledgebase , 2001, Nucleic Acids Res..

[41]  C. Ávila,et al.  Identification of genes regulated by ammonium availability in the roots of maritime pine trees , 2010, Amino Acids.