Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs?

BackgroundCurrent genome sequencing projects reveal substantial numbers of taxonomically restricted, so called orphan genes that lack homology with genes from other evolutionary lineages. However, it is not clear to what extent orphan genes are real, genomic artifacts, or represent non-coding RNAs.ResultsHere, we use a simple set of assumptions to test the nature of orphan genes. First, a sequence that is transcribed is considered a real biological entity. Second, every sequence that is supported by proteome data or shows a depletion of non-synonymous substitutions is a protein-coding gene. Using genomic, transcriptomic and proteomic data for the nematode Pristionchus pacificus, we show that between 4129–7997 (42–81 %) of predicted orphan genes are expressed and 3818–7545 (39–76 %) of orphan genes are under negative selection. In three cases that exhibited strong evolutionary constraint but lacked expression evidence in 14 RNA-seq samples, we could experimentally validate the predicted gene structures. Comparing different data sets to infer selection on orphan gene clusters, we find that the presence of a closely related genome provides the most powerful resource to robustly identify evidence of negative selection. However, even in the absence of other genomic data, the availability of paralogous sequences was enough to show negative selection in 8–10 % of orphan genes.ConclusionsOur study shows that the great majority of previously identified orphan genes in P. pacificus are indeed protein-coding genes. Even though this work represents a case study on a single species, our approach can be transferred to genomic data of other non-model organisms in order to ascertain the protein-coding nature of orphan genes.

[1]  R. Sommer,et al.  Phosphoproteome of Pristionchus pacificus Provides Insights into Architecture of Signaling Networks in Nematode Models* , 2012, Molecular & Cellular Proteomics.

[2]  Jun Wang,et al.  Identification and characterization of insect-specific proteins by genome data analysis , 2007, BMC Genomics.

[3]  M. Zou,et al.  Genome-wide identification of lineage-specific genes within Caenorhabditis elegans. , 2015, Genomics.

[4]  R. Sommer,et al.  The Orphan Gene dauerless Regulates Dauer Development and Intraspecific Competition in Nematodes by Copy Number Variation , 2015, PLoS genetics.

[5]  R. Sommer,et al.  Horizontal gene transfer of microbial cellulases into nematode genomes is associated with functional assimilation and gene turnover , 2011, BMC Evolutionary Biology.

[6]  R. Durbin,et al.  The Genome Sequence of Caenorhabditis briggsae: A Platform for Comparative Genomics , 2003, PLoS biology.

[7]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[8]  Mark Yandell,et al.  MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects , 2011, BMC Bioinformatics.

[9]  E. Capriotti,et al.  Functional annotations improve the predictive score of human disease‐related mutations in proteins , 2009, Human mutation.

[10]  A. Hendry,et al.  Genome divergence during evolutionary diversification as revealed in replicate lake-stream stickleback population pairs. , 2012, Molecular ecology.

[11]  R. Sommer,et al.  Proteogenomics of Pristionchus pacificus reveals distinct proteome structure of nematode models. , 2010, Genome research.

[12]  Ze Zhang,et al.  Identification and evolution of the orphan genes in the domestic silkworm,Bombyx mori , 2015, FEBS letters.

[13]  Kevin R. Thornton,et al.  Genome-wide analysis of a long-term evolution experiment with Drosophila , 2010, Nature.

[14]  Ralf J. Sommer,et al.  A Developmental Switch Coupled to the Evolution of Plasticity Acts through a Sulfatase , 2013, Cell.

[15]  Xiuxin Deng,et al.  Identification, characterization and expression analysis of lineage-specific genes within sweet orange (Citrus sinensis) , 2015, BMC Genomics.

[16]  E. Bornberg-Bauer,et al.  Mechanisms and Dynamics of Orphan Gene Emergence in Insect Genomes , 2013, Genome biology and evolution.

[17]  R. Sommer,et al.  The importance of being regular: Caenorhabditis elegans and Pristionchus pacificus defecation mutants are hypersusceptible to bacterial pathogens. , 2012, International journal for parasitology.

[18]  L. Bernatchez,et al.  Genome-wide patterns of divergence during speciation: the lake whitefish case study , 2012, Philosophical Transactions of the Royal Society B: Biological Sciences.

[19]  Arun K. Ramani,et al.  Comparative RNAi Screens in C. elegans and C. briggsae Reveal the Impact of Developmental System Drift on Gene Function , 2014, PLoS genetics.

[20]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[21]  Bairong Shen,et al.  New genes drive the evolution of gene interaction networks in the human and mouse genomes , 2015, Genome Biology.

[22]  H. Fischer Towards quantitative biology: integration of biological information to elucidate disease pathways and to guide drug discovery. , 2005, Biotechnology annual review.

[23]  Peter A. Meric,et al.  Lineage-Specific Biology Revealed by a Finished Genome Assembly of the Mouse , 2009, PLoS biology.

[24]  Frédéric J. J. Chain,et al.  Genomics of Divergence along a Continuum of Parapatric Population Differentiation , 2015, PLoS Genetics.

[25]  A. Clark,et al.  Genomics of Ecological Adaptation in Cactophilic Drosophila , 2014, Genome biology and evolution.

[26]  R. Sommer,et al.  Expressional and functional variation of horizontally acquired cellulases in the nematode Pristionchus pacificus. , 2012, Gene.

[27]  Gabriel V. Markov,et al.  Ancient gene duplications have shaped developmental stage-specific expression in Pristionchus pacificus , 2015, BMC Evolutionary Biology.

[28]  Wen-Jiu Guo,et al.  Significant Comparative Characteristics between Orphan and Nonorphan Genes in the Rice (Oryza sativa L.) Genome , 2007, Comparative and functional genomics.

[29]  T. Bosch,et al.  More than just orphans: are taxonomically-restricted genes important in evolution? , 2009, Trends in genetics : TIG.

[30]  D. Tautz,et al.  The evolutionary origin of orphan genes , 2011, Nature Reviews Genetics.

[31]  R. Sommer,et al.  Morphological, genetic and molecular description of Pristionchus pacificus sp. n. (Nematoda : Neodiplogastridae ) , 1996 .

[32]  R. Sommer,et al.  Computational archaeology of the Pristionchus pacificus genome reveals evidence of horizontal gene transfers from insects , 2011, BMC Evolutionary Biology.

[33]  C. Kosiol,et al.  The life cycle of Drosophila orphan genes , 2014, eLife.

[34]  H. Ellegren Comparative genomics and the study of evolution by natural selection , 2008, Molecular ecology.

[35]  Kevin R. Thornton,et al.  The origin of new genes: glimpses from the young and old , 2003, Nature Reviews Genetics.

[36]  Rosane Minghim,et al.  InteractiVenn: a web-based tool for the analysis of sets through Venn diagrams , 2015, BMC Bioinformatics.

[37]  Jose Lugo-Martinez,et al.  Extensive Error in the Number of Genes Inferred from Draft Genome Assemblies , 2014, PLoS Comput. Biol..

[38]  Christian Schlötterer,et al.  Genes from scratch – the evolutionary fate of de novo genes , 2015, Trends in genetics : TIG.

[39]  Transcriptomic characterisation and genomic glimps into the toxigenic dinoflagellate Azadinium spinosum, with emphasis on polykeitde synthase genes , 2015, BMC Genomics.

[40]  Veeren M. Chauhan,et al.  Comparative transcriptomics of the nematode gut identifies global shifts in feeding mode and pathogen susceptibility , 2016, BMC Research Notes.

[41]  R. Sommer,et al.  Cryptic variation in vulva development by cis-regulatory evolution of a HAIRY-binding site , 2013, Nature Communications.

[42]  Dmitri A. Petrov,et al.  Relaxed Purifying Selection and Possibly High Rate of Adaptation in Primate Lineage-Specific Genes , 2010, Genome biology and evolution.

[43]  R. Sommer,et al.  Divergent gene expression in the conserved dauer stage of the nematodes Pristionchus pacificus and Caenorhabditis elegans , 2012, BMC Genomics.

[44]  Peer Bork,et al.  PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments , 2006, Nucleic Acids Res..

[45]  Frédéric J. J. Chain,et al.  Extensive Copy-Number Variation of Young Genes across Stickleback Populations , 2014, PLoS genetics.

[46]  Christoph Dieterich,et al.  The Pristionchus pacificus genome provides a unique perspective on nematode lifestyle and parasitism , 2008, Nature Genetics.

[47]  C. Rödelsperger,et al.  Microevolution of Duplications and Deletions and Their Impact on Gene Expression in the Nematode Pristionchus pacificus , 2015, PloS one.

[48]  L. Armengol,et al.  Origin of primate orphan genes: a comparative genomics approach. , 2008, Molecular biology and evolution.

[49]  R. Sommer,et al.  System Wide Analysis of the Evolution of Innate Immunity in the Nematode Model Species Caenorhabditis elegans and Pristionchus pacificus , 2012, PloS one.

[50]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[51]  R. Sommer,et al.  Structure, Function and Evolution of The Nematode Genome , 2013 .

[52]  Christoph Dieterich,et al.  Characterization of Genetic Diversity in the Nematode Pristionchus pacificus from Population-Scale Resequencing Data , 2014, Genetics.

[53]  M. Zou,et al.  Genome-wide identification, characterization, and expression analysis of lineage-specific genes within zebrafish , 2013, BMC Genomics.

[54]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.