Detecting novel genes with sparse arrays.

Species-specific genes play an important role in defining the phenotype of an organism. However, current gene prediction methods can only efficiently find genes that share features such as sequence similarity or general sequence characteristics with previously known genes. Novel sequencing methods and tiling arrays can be used to find genes without prior information and they have demonstrated that novel genes can still be found from extensively studied model organisms. Unfortunately, these methods are expensive and thus are not easily applicable, e.g., to finding genes that are expressed only in very specific conditions. We demonstrate a method for finding novel genes with sparse arrays, applying it on the 33.9 Mb genome of the filamentous fungus Trichoderma reesei. Our computational method does not require normalisations between arrays and it takes into account the multiple-testing problem typical for analysis of microarray data. In contrast to tiling arrays, that use overlapping probes, only one 25 mer microarray oligonucleotide probe was used for every 100b. Thus, only relatively little space on a microarray slide was required to cover the intergenic regions of a genome. The analysis was done as a by-product of a conventional microarray experiment with no additional costs. We found at least 23 good candidates for novel transcripts that could code for proteins and all of which were expressed at high levels. Candidate genes were found to neighbour ire1 and cre1 and many other regulatory genes. Our simple, low-cost method can easily be applied to finding novel species-specific genes without prior knowledge of their sequence properties.

[1]  Merja Penttilä,et al.  The effect of specific growth rate on protein synthesis and secretion in the filamentous fungus Trichoderma reesei. , 2005, Microbiology.

[2]  M. Penttilä,et al.  The ire1 and ptc2 genes involved in the unfolded protein response pathway in the filamentous fungus Trichoderma reesei , 2004, Molecular Genetics and Genomics.

[3]  Mikko Arvas,et al.  Common features and interesting differences in transcriptional responses to secretion stress in the fungi Trichoderma reesei and Saccharomyces cerevisiae , 2006, BMC Genomics.

[4]  P. Kemmeren,et al.  Monitoring global messenger RNA changes in externally controlled microarray experiments , 2003, EMBO reports.

[5]  M. Borodovsky,et al.  Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training. , 2008, Genome research.

[6]  Bernard Henrissat,et al.  Corrigendum: Genome sequencing and analysis of the biomass-degrading fungus Trichoderma reesei (syn. Hypocrea jecorina) , 2008, Nature Biotechnology.

[7]  Tadashi Imanishi,et al.  A genome-wide survey of changes in protein evolutionary rates across four closely related species of Saccharomyces sensu stricto group , 2007, BMC Evolutionary Biology.

[8]  Graziano Pesole,et al.  UTRdb and UTRsite: a collection of sequences and regulatory motifs of the untranslated regions of eukaryotic mRNAs , 2004, Nucleic Acids Res..

[9]  Wolfgang Huber,et al.  A high-resolution map of transcription in the yeast genome. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Wu-chun Feng,et al.  Missing genes in the annotation of prokaryotic genomes , 2010, BMC Bioinformatics.

[11]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[12]  Owen J. Marshall PerlPrimer: cross-platform, graphical primer design for standard, bisulphite and real-time PCR , 2004, Bioinform..

[13]  J. Cherry,et al.  Directed evolution of industrial enzymes: an update. , 2003, Current opinion in biotechnology.

[14]  K. H. Wolfe,et al.  Evidence for horizontal transfer of a secondary metabolite gene cluster between fungi , 2008, Genome Biology.

[15]  S. Lewis,et al.  The generic genome browser: a building block for a model organism system database. , 2002, Genome research.

[16]  S. Zeilinger,et al.  Crel, the carbon catabolite repressor protein from Trichoderma reesei , 1995, FEBS letters.

[17]  Thomas E. Royce,et al.  Global Identification of Human Transcribed Sequences with Genome Tiling Arrays , 2004, Science.

[18]  Thomas R Gingeras,et al.  Origin of phenotypes: genes and transcripts. , 2007, Genome research.

[19]  C. Ponting,et al.  Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. , 2007, Genome research.

[20]  M. Gerstein,et al.  The Transcriptional Landscape of the Yeast Genome Defined by RNA Sequencing , 2008, Science.

[21]  Mark Gerstein,et al.  Assessing the need for sequence-based normalization in tiling microarray experiments , 2007, Bioinform..

[22]  Cathryn J. Rehmeyer,et al.  Organization of chromosome ends in the rice blast fungus, Magnaporthe oryzae , 2006, Nucleic acids research.

[23]  Sarah Calvo,et al.  Insights from the genome of the biotrophic fungal plant pathogen Ustilago maydis , 2006, Nature.

[24]  Søren Brunak,et al.  Functionality of system components: conservation of protein function in protein feature space. , 2003, Genome research.

[25]  Polymeric SUC genes in natural populations of Saccharomyces cerevisiae. , 1996, FEMS microbiology letters.

[26]  Ari J. S. Ferreira,et al.  Elucidation of the Metabolic Fate of Glucose in the Filamentous Fungus Trichoderma reesei Using Expressed Sequence Tag (EST) Analysis and cDNA Microarrays* , 2002, The Journal of Biological Chemistry.

[27]  G. Church,et al.  RNA expression analysis using a 30 base pair resolution Escherichia coli genome array , 2000, Nature Biotechnology.

[28]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[29]  C. Chin,et al.  Global identification of noncoding RNAs in Saccharomyces cerevisiae by modulating an essential RNA processing pathway. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[30]  M. Penttilä,et al.  Transcriptional monitoring of steady state and effects of anaerobic phases in chemostat cultures of the filamentous fungus Trichoderma reesei , 2006, BMC Genomics.

[31]  Rolf Apweiler,et al.  InterProScan: protein domains identifier , 2005, Nucleic Acids Res..

[32]  Jaideep P. Sundaram,et al.  Genomic Islands in the Pathogenic Filamentous Fungus Aspergillus fumigatus , 2008, PLoS genetics.

[33]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[34]  D. Eveleigh,et al.  SELECTIVE SCREENING METHODS FOR THE ISOLATION OF HIGH YIELDING CELLULASE MUTANTS OF TRICHODERMA REESEI , 1979 .

[35]  T. Speed,et al.  Summaries of Affymetrix GeneChip probe level data. , 2003, Nucleic acids research.

[36]  David Haussler,et al.  Using native and syntenically mapped cDNA alignments to improve de novo gene finding , 2008, Bioinform..

[37]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[38]  T. Houfek,et al.  Transcriptional Regulation of Biomass-degrading Enzymes in the Filamentous Fungus Trichoderma reesei* , 2003, Journal of Biological Chemistry.

[39]  Mark Gerstein,et al.  Issues in the analysis of oligonucleotide tiling microarrays for transcript mapping. , 2005, Trends in genetics : TIG.

[40]  M. Brent Steady progress and recent breakthroughs in the accuracy of automated genome annotation , 2008, Nature Reviews Genetics.

[41]  Tim R. Mercer,et al.  Differentiating Protein-Coding and Noncoding RNA: Challenges and Ambiguities , 2008, PLoS Comput. Biol..

[42]  Sean R. Eddy,et al.  Rfam: an RNA family database , 2003, Nucleic Acids Res..

[43]  David K. Smith,et al.  Accelerated Evolutionary Rate May Be Responsible for the Emergence of Lineage-Specific Genes in Ascomycota , 2006, Journal of Molecular Evolution.

[44]  Cathy H. Wu,et al.  InterPro, progress and status in 2005 , 2004, Nucleic Acids Res..

[45]  Ronald W. Davis,et al.  High-density yeast-tiling array reveals previously undiscovered introns and extensive regulation of meiotic splicing , 2007, Proceedings of the National Academy of Sciences.

[46]  Yudong D. He,et al.  Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer , 2001, Nature Biotechnology.

[47]  Haitao Zhao,et al.  Microrna Regulation of Messenger-like Noncoding Rnas: a Network of Mutual Microrna Control , 2022 .

[48]  A. Bradley,et al.  Identification of mammalian microRNA host genes and transcription units. , 2004, Genome research.

[49]  K. Peck,et al.  Optimization of probe length and the number of probes per gene for optimal microarray analysis of gene expression. , 2004, Nucleic acids research.

[50]  Paulo P. Amaral,et al.  The Eukaryotic Genome as an RNA Machine , 2008, Science.

[51]  K. Isono,et al.  Genome sequencing and analysis of Aspergillus oryzae , 2005, Nature.

[52]  D. Ussery,et al.  Comparison of protein coding gene contents of the fungal phyla Pezizomycotina and Saccharomycotina , 2007, BMC Genomics.

[53]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[54]  Matthew R. Pocock,et al.  The Bioperl toolkit: Perl modules for the life sciences. , 2002, Genome research.

[55]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[56]  I. Goodhead,et al.  Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution , 2008, Nature.

[57]  C. Kubicek,et al.  Sexually competent, sucrose- and nitrate-assimilating strains of Hypocrea jecorina (Trichoderma reesei) from South American soils , 2000 .

[58]  M. Hattori,et al.  A large-scale full-length cDNA analysis to explore the budding yeast transcriptome , 2006, Proceedings of the National Academy of Sciences.