Design, Validation and Annotation of Transcriptome-Wide Oligonucleotide Probes for the Oligochaete Annelid Eisenia fetida

High density oligonucleotide probe arrays have increasingly become an important tool in genomics studies. In organisms with incomplete genome sequence, one strategy for oligo probe design is to reduce the number of unique probes that target every non-redundant transcript through bioinformatic analysis and experimental testing. Here we adopted this strategy in making oligo probes for the earthworm Eisenia fetida, a species for which we have sequenced transcriptome-scale expressed sequence tags (ESTs). Our objectives were to identify unique transcripts as targets, to select an optimal and non-redundant oligo probe for each of these target ESTs, and to annotate the selected target sequences. We developed a streamlined and easy-to-follow approach to the design, validation and annotation of species-specific array probes. Four 244K-formatted oligo arrays were designed using eArray and were hybridized to a pooled E. fetida cRNA sample. We identified 63,541 probes with unsaturated signal intensities consistently above the background level. Target transcripts of these probes were annotated using several sequence alignment algorithms. Significant hits were obtained for 37,439 (59%) probed targets. We validated and made publicly available 63.5K oligo probes so the earthworm research community can use them to pursue ecological, toxicological, and other functional genomics questions. Our approach is efficient, cost-effective and robust because it (1) does not require a major genomics core facility; (2) allows new probes to be easily added and old probes modified or eliminated when new sequence information becomes available, (3) is not bioinformatics-intensive upfront but does provide opportunities for more in-depth annotation of biological functions for target genes; and (4) if desired, EST orthologs to the UniGene clusters of a reference genome can be identified and selected in order to improve the target gene specificity of designed probes. This approach is particularly applicable to organisms with a wealth of EST sequences but unfinished genome.

[1]  M. Bittner,et al.  Expression profiling using cDNA microarrays , 1999, Nature Genetics.

[2]  Mark J. P. Chaisson,et al.  Short read fragment assembly of bacterial genomes. , 2008, Genome research.

[3]  E. Liu,et al.  Interrogating the transcriptome. , 2004, Trends in biotechnology.

[4]  Cecilia Tamborindeguy,et al.  Genomic resources for Myzus persicae: EST sequencing, SNP identification, and microarray design , 2007, BMC Genomics.

[5]  Rolf Apweiler,et al.  InterProScan: protein domains identifier , 2005, Nucleic Acids Res..

[6]  Claus Svendsen,et al.  Earthworm responses to Cd and Cu under fluctuating environmental conditions: a comparison with results from laboratory exposures. , 2005, Environmental pollution.

[7]  Daryl R. Williams,et al.  Application of ESTs in microarray analysis. , 2009, Methods in molecular biology.

[8]  Soon Cheol Park,et al.  Transcriptome analysis in the midgut of the earthworm (Eisenia andrei) using expressed sequence tags. , 2005, Biochemical and biophysical research communications.

[9]  Mehdi Pirooznia,et al.  Transcriptomic analysis of RDX and TNT interactive sublethal effects in the earthworm Eisenia fetida , 2008, BMC Genomics.

[10]  Franco Cerrina,et al.  Gene expression analysis using oligonucleotide arrays produced by maskless photolithography. , 2002, Genome research.

[11]  Cathy H. Wu,et al.  InterPro, progress and status in 2005 , 2004, Nucleic Acids Res..

[12]  Edward J. Perkins,et al.  Cloning, analysis and functional annotation of expressed sequence tags from the Earthworm Eisenia fetida , 2007, BMC Bioinformatics.

[13]  F. Piferrer,et al.  Genomic resources for a commercial flatfish, the Senegalese sole (Solea senegalensis): EST sequencing, oligo microarray design, and development of the Soleamold bioinformatic platform , 2008, BMC Genomics.

[14]  A. Reinecke,et al.  Earthworms as Test Organisms in Ecotoxicological Assessment of Toxicant Impacts on Ecosystems , 2004 .

[15]  J. Marden,et al.  Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing , 2008, Molecular ecology.

[16]  M. Blaxter,et al.  Transcriptome profiling of developmental and xenobiotic responses in a keystone soil animal, the oligochaete annelid Lumbricus rubellus , 2008, BMC Genomics.

[17]  R. Reinhardt,et al.  Development and validation of a gene expression oligo microarray for the gilthead sea bream (Sparus aurata) , 2008, BMC Genomics.

[18]  P. Kille,et al.  Measurement of annetocin gene expression: a new reproductive biomarker in earthworm ecotoxicology. , 2004, Ecotoxicology and environmental safety.

[19]  S. Batzoglou,et al.  Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies , 2007, PloS one.

[20]  Gabor T. Marth,et al.  Whole-genome sequencing and variant discovery in C. elegans , 2008, Nature Methods.

[21]  G. Chillemi,et al.  Microarrays and high-throughput transcriptomic analysis in species with incomplete availability of genomic sequences. , 2009, New biotechnology.

[22]  Patrick S. Schnable,et al.  Refinement of Light-Responsive Transcript Lists Using Rice Oligonucleotide Arrays: Evaluation of Gene-Redundancy , 2008, PloS one.

[23]  High-density rhesus macaque oligonucleotide microarray design using early-stage rhesus genome sequence information and human genome annotations , 2007, BMC Genomics.

[24]  David P. Kreil,et al.  Model-based probe set optimization for high-performance microarrays , 2008, Nucleic acids research.

[25]  S. Stürzenbaum,et al.  Comparative transcriptomic responses to chronic cadmium, fluoranthene, and atrazine exposure in Lumbricus rubellus. , 2008, Environmental science & technology.

[26]  M. Katze,et al.  Analysis of the Macaca mulatta transcriptome and the sequence divergence between Macaca and human , 2005, Genome Biology.

[27]  S. Stürzenbaum,et al.  Earthworm genomes, genes and proteins: the (re)discovery of Darwin's worms , 2008, Proceedings of the Royal Society B: Biological Sciences.

[28]  M. Sugimoto,et al.  Molecular Cloning, Sequencing, and Expression of cDNA Encoding Serine Protease with Fibrinolytic Activity from Earthworm , 2001, Bioscience, biotechnology, and biochemistry.

[29]  K. Lindblad-Toh,et al.  Assisted assembly: how to improve a de novo genome assembly by using related species , 2009, Genome Biology.

[30]  Mark Johnson,et al.  NCBI BLAST: a better web interface , 2008, Nucleic Acids Res..

[31]  Mehdi Pirooznia,et al.  Toxicogenomic analysis provides new insights into molecular mechanisms of the sublethal toxicity of 2,4,6-trinitrotoluene in Eisenia fetida. , 2007, Environmental science & technology.

[32]  S. Lukyanov,et al.  Simple cDNA normalization using kamchatka crab duplex-specific nuclease. , 2004, Nucleic acids research.

[33]  Wei Chu,et al.  Biomarker discovery in microarray gene expression data with Gaussian processes , 2005, Bioinform..

[34]  D. Vieau,et al.  Cloning and real-time PCR testing of 14 potential biomarkers in Eisenia fetida following cadmium exposure. , 2006, Environmental science & technology.

[35]  Catherine Shaffer Next-generation sequencing outpaces expectations , 2007, Nature Biotechnology.

[36]  T. Ryan Gregory,et al.  Eukaryotic genome size databases , 2006, Nucleic Acids Res..

[37]  S. P. Fodor,et al.  High density synthetic oligonucleotide arrays , 1999, Nature Genetics.

[38]  W. Ansorge Next-generation DNA sequencing techniques. , 2009, New biotechnology.

[39]  Edward J. Perkins,et al.  Identification and Optimization of Classifier Genes from Multi-Class Earthworm Microarray Dataset , 2010, PloS one.

[40]  C. Edwards 1 The Importance of Earthworms as Key Representatives of the Soil Fauna , 2004 .

[41]  Steven J. M. Jones,et al.  De novo genome sequence assembly of a filamentous fungus using Sanger, 454 and Illumina sequence data , 2009, Genome Biology.

[42]  Yudong D. He,et al.  Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer , 2001, Nature Biotechnology.

[43]  A. Chenchik,et al.  Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. , 2001, BioTechniques.

[44]  M. Schatz,et al.  Assembly of large genomes using second-generation sequencing. , 2010, Genome research.

[45]  Stefano Toppo,et al.  Development and production of an oligonucleotide MuscleChip: use for validation of ambiguous ESTs , 2002, BMC Bioinformatics.

[46]  Patrick S. Schnable,et al.  Direct calibration of PICKY-designed microarrays , 2009, BMC Bioinformatics.

[47]  Jaques Reifman,et al.  The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation , 2008, BMC Bioinformatics.

[48]  N. Rothman,et al.  Discovery of Novel Biomarkers by Microarray Analysis of Peripheral Blood Mononuclear Cell Gene Expression in Benzene-Exposed Workers , 2005, Environmental health perspectives.

[49]  S. Koren,et al.  Assembly algorithms for next-generation sequencing data. , 2010, Genomics.

[50]  Jae Min Lee,et al.  Molecular approach to annelid regeneration: cDNA subtraction cloning reveals various novel genes that are upregulated during the large‐scale regeneration of the oligochaete, Enchytraeus japonensis , 2006, Developmental dynamics : an official publication of the American Association of Anatomists.

[51]  Patrick S. Schnable,et al.  Picky: oligo microarray design for large genomes , 2004, Bioinform..

[52]  Diane Gershon,et al.  DNA microarrays: More than gene expression , 2005, Nature.

[53]  Cheol-Goo Hur,et al.  TISA: tissue-specific alternative splicing in human and mouse genes. , 2006, DNA research : an international journal for rapid publication of reports on genes and genomes.

[54]  A. Leprêtre,et al.  Cd/Zn exposure interactions on metallothionein response in Eisenia fetida (Annelida, Oligochaeta). , 2007, Comparative biochemistry and physiology. Toxicology & pharmacology : CBP.