Automatic Workflow for the Identification of Constitutively-Expressed Genes Based on Mapped NGS Reads

Expression analyses such as quantitative and/or real-time PCR require the use of reference genes for normalization in order to obtain reliable assessments. The expression levels of these reference genes must remain constant in all different experimental conditions and/or tissues under study. Traditionally, housekeeping genes have been used for this purpose, but most of them have been reported to vary their expression levels under some experimental conditions. Consequently, the election of the best reference genes should be tested and validated in every experimental scenario. Microarray data are not always available for the search of appropriate reference genes, but NGS experiments are increasingly common. For this reason, an automatic workflow based on mapped NGS reads is presented with the aim of obtaining putative reference genes for a giving species in the experimental conditions of interest. The calculation of the coefficient of variation (CV) and a simple, normalized expression value such as RPKM per transcript allows for filtering and selecting those transcripts expressed homogeneously and consistently in all analyzed conditions. This workflow has been tested with Roche/454 reads obtained from olive (Olea europaea L.) pollen and pistil at different developmental stages, as well as with Illumina paired-end reads from two different accessions of Arabidopsis thaliana. Some of the putative candidate reference genes have been experimentally validated.

[1]  T. Hudson,et al.  Control genes and variability: absence of ubiquitous reference transcripts in diverse mammalian expression studies. , 2002, Genome research.

[2]  M. Stitt,et al.  Genome-Wide Identification and Testing of Superior Reference Genes for Transcript Normalization , 2005 .

[3]  M. Kalita,et al.  Rubisco-bis-phosphate oxygenase ( RuBP )- A potential housekeeping gene for qPCR assays in tea , 2012 .

[4]  P. Hernández,et al.  Genome-wide identification of alternate bearing-associated microRNAs (miRNAs) in olive (Olea europaea L.) , 2013, BMC Plant Biology.

[5]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[6]  B. Román,et al.  Evaluation of candidate reference genes for expression studies in Pisumsativum under different experimental conditions , 2010, Planta.

[7]  Joshua C. Johnson,et al.  Validation of reference genes for gene expression analysis in olive (Olea europaea) mesocarp tissue by quantitative real-time RT-PCR , 2014, BMC Research Notes.

[8]  J. de Dios Alché,et al.  The major olive pollen allergen (Ole e I) shows both gametophytic and sporophytic expression during anther development, and its synthesis and storage takes place in the RER. , 1999, Journal of cell science.

[9]  M. Schmid,et al.  Modulation of Ambient Temperature-Dependent Flowering in Arabidopsis thaliana by Natural Variation of FLOWERING LOCUS M , 2015, PLoS genetics.

[10]  M. De Loose,et al.  Validation of reference genes for gene expression analysis in chicory (Cichorium intybus) using quantitative real-time PCR , 2010, BMC Molecular Biology.

[11]  M. Stitt,et al.  Genome-Wide Identification and Testing of Superior Reference Genes for Transcript Normalization in Arabidopsis1[w] , 2005, Plant Physiology.

[12]  G. Pertea,et al.  Comparative Analyses of Potato Expressed Sequence Tag Libraries1 , 2003, Plant Physiology.

[13]  M. Gonzalo Claros,et al.  AutoFlow: an easy way to build workflows , 2014, IWBBIO.

[14]  P. J. Higgins,et al.  Control selection for RNA quantitation. , 2000, BioTechniques.

[15]  M. Gonzalo Claros,et al.  ReprOlive: a database with linked data for the olive tree (Olea europaea L.) reproductive transcriptome , 2015, Front. Plant Sci..

[16]  Pjotr Prins,et al.  BioRuby: bioinformatics software for the Ruby programming language , 2010, Bioinform..

[17]  Edmund M. Clarke,et al.  Analysis and verification of the HMGB1 signaling pathway , 2010, BMC Bioinformatics.

[18]  S. Lund,et al.  An optimized grapevine RNA isolation procedure and statistical determination of reference genes for real-time RT-PCR during berry development , 2006, BMC Plant Biology.

[19]  Tanya Z. Berardini,et al.  The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools , 2011, Nucleic Acids Res..

[20]  Suzanne W. Hoogstrate,et al.  BURSTING POLLEN is required to organize the pollen germination plaque and pollen tube tip in Arabidopsis thaliana. , 2015, The New phytologist.

[21]  J. Jakše,et al.  Validation of candidate reference genes in RT-qPCR studies of developing olive fruit and expression analysis of four genes involved in fatty acids metabolism , 2013, Molecular Breeding.

[22]  B. Ruperti,et al.  Evaluation of RNA extraction methods and identification of putative reference genes for real-time quantitative polymerase chain reaction expression studies on olive (Olea europaea L.) fruits. , 2012, Journal of agricultural and food chemistry.

[23]  S. Strauss,et al.  Validating internal controls for quantitative plant gene expression studies , 2004, BMC Plant Biology.

[24]  J. S. Coker,et al.  Selection of candidate housekeeping controls in tomato plants using EST data. , 2003, BioTechniques.

[25]  J. Degenhardt,et al.  Isolation and characterization of terpene synthases potentially involved in flavor development of ripening olive (Olea europaea) fruits. , 2012, Journal of plant physiology.

[26]  L. Hoffmann,et al.  Housekeeping gene selection for real-time RT-PCR normalization in potato during biotic and abiotic stress. , 2005, Journal of experimental botany.

[27]  David S. Skibbe,et al.  Male reproductive development: gene expression profiling of maize anther and pollen ontogeny , 2008, Genome Biology.

[28]  S. Walker,et al.  Quantitative RT-PCR : Pitfalls and Potential , 1999 .

[29]  M. Gonzalo Claros,et al.  SeqTrim: a high-throughput pipeline for pre-processing any type of sequence read , 2010, BMC Bioinformatics.