Computational discovery and annotation of conserved small open reading frames in fungal genomes

BackgroundSmall open reading frames (smORF/sORFs) that encode short protein sequences are often overlooked during the standard gene prediction process thus leading to many sORFs being left undiscovered and/or misannotated. For many genomes, a second round of sORF targeted gene prediction can complement the existing annotation. In this study, we specifically targeted the identification of ORFs encoding for 80 amino acid residues or less from 31 fungal genomes. We then compared the predicted sORFs and analysed those that are highly conserved among the genomes.ResultsA first set of sORFs was identified from existing annotations that fitted the maximum of 80 residues criterion. A second set was predicted using parameters that specifically searched for ORF candidates of 80 codons or less in the exonic, intronic and intergenic sequences of the subject genomes. A total of 1986 conserved sORFs were predicted and characterized.ConclusionsIt is evident that numerous open reading frames that could potentially encode for polypeptides consisting of 80 amino acid residues or less are overlooked during standard gene prediction and annotation. From our results, additional targeted reannotation of genomes is clearly able to complement standard genome annotation to identify sORFs. Due to the lack of, and limitations with experimental validation, we propose that a simple conservation analysis can provide an acceptable means of ensuring that the predicted sORFs are sufficiently clear of gene prediction artefacts.

[1]  B. Andrews,et al.  Small open reading frames: not so small anymore. , 2006, Genome research.

[2]  J. Rinn,et al.  Peptidomic discovery of short open reading frame-encoded peptides in human cells , 2012, Nature chemical biology.

[3]  I. Kurtser,et al.  Replication Initiation Proteins Regulate a Developmental Checkpoint in Bacillus subtilis , 2001, Cell.

[4]  N. Najimudin,et al.  Gene expression patterns of Glaciozyma antarctica PI12 in response to cold, and freeze stress , 2019, Polar Science.

[5]  Jia Ye,et al.  Vertebrate gene predictions and the problem of large genes , 2003, Nature Reviews Genetics.

[6]  P. Cohen,et al.  IGF-I regulates the age-dependent signaling peptide humanin , 2014, Aging cell.

[7]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[8]  Tetsuya Sakurai,et al.  sORF finder: a program package to identify small open reading frames with high coding potential , 2010, Bioinform..

[9]  S. Nathan,et al.  Identification of sRNA mediated responses to nutrient depletion in Burkholderia pseudomallei , 2017, Scientific Reports.

[10]  R. Illias,et al.  Thermotolerance and molecular chaperone function of an SGT1-like protein from the psychrophilic yeast, Glaciozyma antarctica , 2016, Cell Stress and Chaperones.

[11]  Jef D Boeke,et al.  Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. , 2006, Genome research.

[12]  T. D. Schneider,et al.  Small membrane proteins found by comparative genomics and ribosome binding site models , 2008, Molecular microbiology.

[13]  Kim Rutherford,et al.  Artemis: sequence visualization and annotation , 2000, Bioinform..

[14]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[15]  J. Boeke,et al.  Small open reading frames: beautiful needles in the haystack. , 1997, Genome research.

[16]  Lisa J. Mullan,et al.  Short EMBOSS User Guide , 2002, Briefings Bioinform..

[17]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[18]  M. Robles,et al.  University of Birmingham High throughput functional annotation and data mining with the Blast2GO suite , 2022 .

[19]  Stefan Götz,et al.  Blast2GO: A Comprehensive Suite for Functional Analysis in Plant Genomics , 2007, International journal of plant genomics.

[20]  Nabil Abdennadher,et al.  Porting PHYLIP phylogenetic package on the Desktop GRID platform XtremWeb-CH , 2007, HealthGrid.

[21]  S. Nathan,et al.  Computational discovery and RT-PCR validation of novel Burkholderia conserved and Burkholderia pseudomallei unique sRNAs , 2012, BMC Genomics.

[22]  R. Illias,et al.  Characterization of Afp1, an antifreeze protein from the psychrophilic yeast Glaciozymaantarctica PI12 , 2012, Extremophiles.

[23]  Ignacio Blanquer,et al.  Blast2GO goes Grid: Developing a Grid-Enabled Prototype for Functional Genomics Analysis , 2006, HealthGrid.

[24]  A. Eyre-Walker,et al.  Hundreds of putatively functional small open reading frames in Drosophila , 2011, Genome Biology.

[25]  Lisa J. Mullan,et al.  Short EMBOSS User Guide. European Molecular Biology Open Software Suite. , 2002, Briefings in bioinformatics.

[26]  R. Illias,et al.  Molecular cloning, expression and characterisation of Afp4, an antifreeze protein from Glaciozyma antarctica , 2014, Polar Biology.

[27]  Thomas L. Madden,et al.  BLAST: at the core of a powerful and diverse set of sequence analysis tools , 2004, Nucleic Acids Res..

[28]  B. Barrell,et al.  Life with 6000 Genes , 1996, Science.

[29]  J. Couso,et al.  The 11-aminoacid long Tarsal-less peptides trigger a cell signal in Drosophila leg development. , 2008, Developmental biology.

[30]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[31]  K. Shinozaki,et al.  Small open reading frames associated with morphogenesis are hidden in plant genomes , 2013, Proceedings of the National Academy of Sciences.

[32]  P. Cohen,et al.  Humanin: a harbinger of mitochondrial-derived peptides? , 2013, Trends in Endocrinology & Metabolism.

[33]  J. Villers,et al.  Yeast Pmp3p has an important role in plasma membrane organization , 2015, Journal of Cell Science.

[34]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[35]  Masaya Fujita,et al.  Evidence that entry into sporulation in Bacillus subtilis is governed by a gradual increase in the level and activity of the master regulator Spo0A. , 2005, Genes & development.

[36]  N. Najimudin,et al.  The Glaciozyma antarctica genome reveals an array of systems that provide sustained responses towards temperature variations in a persistently cold habitat , 2018, PloS one.

[37]  Joseph A. Rothnagel,et al.  Emerging evidence for functional peptides encoded by short open reading frames , 2014, Nature Reviews Genetics.

[38]  C. A. Machado,et al.  Comparative Expression Dynamics of Intergenic Long Noncoding RNAs in the Genus Drosophila , 2016, Genome biology and evolution.

[39]  Sue A. Olson,et al.  EMBOSS opens up sequence analysis. European Molecular Biology Open Software Suite. , 2002, Briefings in bioinformatics.

[40]  Juan Pablo Couso,et al.  Peptides Encoded by Short ORFs Control Development and Define a New Eukaryotic Gene Family , 2007, PLoS biology.

[41]  Jun Kawai,et al.  The Abundance of Short Proteins in the Mammalian Proteome , 2006, PLoS genetics.

[42]  Wen-Hsiung Li,et al.  A large number of novel coding small open reading frames in the intergenic regions of the Arabidopsis thaliana genome are transcribed and/or under purifying selection. , 2007, Genome research.

[43]  D. Morris,et al.  Polyamine regulation of ribosome pausing at the upstream open reading frame of S-adenosylmethionine decarboxylase. , 2001, The Journal of biological chemistry.

[44]  J. Fraser,et al.  The Long History of the Diverse Roles of Short ORFs: sPEPs in Fungi , 2018, Proteomics.

[45]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[46]  Izwan Bharudin,et al.  Pengenalpastian dan Profil Pengekspresan Gen Biosintesis Asid Amino Yis Psikrofil, Glaciozyma Antarctica , 2018, Sains Malaysiana.

[47]  Sue A. Olson,et al.  Emboss opens up sequence analysis , 2002, Briefings Bioinform..

[48]  M. Gelfand,et al.  Small Open Reading Frames, Non-Coding RNAs and Repetitive Elements in Bradyrhizobium japonicum USDA 110 , 2016, PloS one.

[49]  Ge Gao,et al.  CPC2: a fast and accurate coding potential calculator based on sequence intrinsic features , 2017, Nucleic Acids Res..

[50]  V. Williamson,et al.  Silencing a candidate nematode effector gene corresponding to the tomato resistance gene Mi-1 leads to acquisition of virulence. , 2008, Molecular plant-microbe interactions : MPMI.

[51]  C. Sensen,et al.  Complete DNA sequence of yeast chromosome XI , 1994, Nature.

[52]  A. Bairoch,et al.  Low molecular weight proteins: A challenge for post‐genomic research , 1998, Electrophoresis.