A new rhesus macaque assembly and annotation for next-generation sequencing analyses

BackgroundThe rhesus macaque (Macaca mulatta) is a key species for advancing biomedical research. Like all draft mammalian genomes, the draft rhesus assembly (rheMac2) has gaps, sequencing errors and misassemblies that have prevented automated annotation pipelines from functioning correctly. Another rhesus macaque assembly, CR_1.0, is also available but is substantially more fragmented than rheMac2 with smaller contigs and scaffolds. Annotations for these two assemblies are limited in completeness and accuracy. High quality assembly and annotation files are required for a wide range of studies including expression, genetic and evolutionary analyses.ResultsWe report a new de novo assembly of the rhesus macaque genome (MacaM) that incorporates both the original Sanger sequences used to assemble rheMac2 and new Illumina sequences from the same animal. MacaM has a weighted average (N50) contig size of 64 kilobases, more than twice the size of the rheMac2 assembly and almost five times the size of the CR_1.0 assembly. The MacaM chromosome assembly incorporates information from previously unutilized mapping data and preliminary annotation of scaffolds. Independent assessment of the assemblies using Ion Torrent read alignments indicates that MacaM is more complete and accurate than rheMac2 and CR_1.0. We assembled messenger RNA sequences from several rhesus tissues into transcripts which allowed us to identify a total of 11,712 complete proteins representing 9,524 distinct genes. Using a combination of our assembled rhesus macaque transcripts and human transcripts, we annotated 18,757 transcripts and 16,050 genes with complete coding sequences in the MacaM assembly. Further, we demonstrate that the new annotations provide greatly improved accuracy as compared to the current annotations of rheMac2. Finally, we show that the MacaM genome provides an accurate resource for alignment of reads produced by RNA sequence expression studies.ConclusionsThe MacaM assembly and annotation files provide a substantially more complete and accurate representation of the rhesus macaque genome than rheMac2 or CR_1.0 and will serve as an important resource for investigators conducting next-generation sequencing studies with nonhuman primates.ReviewersThis article was reviewed by Dr. Lutz Walter, Dr. Soojin Yi and Dr. Kateryna Makova.

[1]  E. Szarka,et al.  Reassessing Domain Architecture Evolution of Metazoan Proteins: Major Impact of Gene Prediction Errors , 2011, Genes.

[2]  Michael R. Brent,et al.  Eval: A software package for analysis of genome annotations , 2003, BMC Bioinformatics.

[3]  Cole Trapnell,et al.  Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. , 2010, Nature biotechnology.

[4]  Francesca Antonacci,et al.  Evolutionary Formation of New Centromeres in Macaque , 2007, Science.

[5]  David G Hendrickson,et al.  Differential analysis of gene regulation at transcript resolution with RNA-seq , 2012, Nature Biotechnology.

[6]  Thomas Singer,et al.  Genome-based analysis of the nonhuman primate Macaca fascicularis as a model for drug safety assessment. , 2011, Genome research.

[7]  Xiongfei Zhang,et al.  Limitations of the rhesus macaque draft genome assembly and annotation , 2012, BMC Genomics.

[8]  W. L. Ruzzo,et al.  Assessment and improvement of Indian‐origin rhesus macaque and Mauritian‐origin cynomolgus macaque genome annotations using deep transcriptome sequencing data , 2014, Journal of medical primatology.

[9]  Kazuho Ikeo,et al.  Rapid Evolution of Major Histocompatibility Complex Class I Genes in Primates Generates New Disease Alleles in Humans via Hitchhiking Diversity , 2006, Genetics.

[10]  Gustavo Glusman,et al.  Genetic divergence of the rhesus macaque major histocompatibility complex. , 2004, Genome research.

[11]  David R. Kelley,et al.  Mis-Assembled “Segmental Duplications” in Two Versions of the Bos taurus Genome , 2012, PloS one.

[12]  Martin Vingron,et al.  Oases: robust de novo RNA-seq assembly across the dynamic range of expression levels , 2012, Bioinform..

[13]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[14]  R. Norgren Improving genome assemblies and annotations for nonhuman primates. , 2013, ILAR journal.

[15]  David J. States,et al.  Identification of protein coding regions by database similarity search , 1993, Nature Genetics.

[16]  N. Kalin Nonhuman primate studies of fear, anxiety, and temperament and the role of benzodiazepine receptors and GABA systems. , 2003, The Journal of clinical psychiatry.

[17]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[18]  B. Mishra,et al.  Comparing De Novo Genome Assembly: The Long and Short of It , 2011, PloS one.

[19]  L. Froenicke,et al.  A high-resolution radiation hybrid map of rhesus macaque chromosome 5 identifies rearrangements in the genome assembly. , 2008, Genomics.

[20]  E. Vallender Expanding whole exome resequencing into non-human primates , 2011, Genome Biology.

[21]  Jenny Tung,et al.  Social environment is associated with gene regulatory variation in the rhesus macaque immune system , 2012, Proceedings of the National Academy of Sciences.

[22]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[23]  Lukas Wagner,et al.  A Greedy Algorithm for Aligning DNA Sequences , 2000, J. Comput. Biol..

[24]  M. Berriman,et al.  A comprehensive evaluation of assembly scaffolding tools , 2014, Genome Biology.

[25]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[26]  E. Vallender,et al.  Bioinformatic approaches to identifying orthologs and assessing evolutionary relationships. , 2009, Methods.

[27]  A. Jauch,et al.  Homologies in human and Macasa fuscata chromosomes revealed by in situ suppression hybridization with human chromosome specific DNA libraries , 2004, Chromosoma.

[28]  Michael Roberts,et al.  The MaSuRCA genome assembler , 2013, Bioinform..

[29]  Leming Zhou,et al.  Sim4cc: a cross-species spliced alignment program , 2009, Nucleic acids research.

[30]  M. Schatz,et al.  Algorithms Gage: a Critical Evaluation of Genome Assemblies and Assembly Material Supplemental , 2008 .

[31]  Jian Wang,et al.  Genome sequencing and comparison of two nonhuman primate animal models, the cynomolgus and Chinese rhesus macaques , 2011, Nature Biotechnology.

[32]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[33]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer , 2011, Nature Biotechnology.

[34]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .

[35]  D. Lipman,et al.  Rapid and sensitive protein similarity searches. , 1985, Science.

[36]  David N. Messina,et al.  Evolutionary and Biomedical Insights from the Rhesus Macaque Genome , 2007, Science.

[37]  Richa Agarwala,et al.  A rhesus macaque radiation hybrid map and comparative analysis with the human genome. , 2005, Genomics.

[38]  Thomas D. Wu,et al.  GMAP: a genomic mapping and alignment program for mRNA and EST sequence , 2005, Bioinform..

[39]  Jeffrey Rogers,et al.  An initial genetic linkage map of the rhesus macaque (Macaca mulatta) genome using human microsatellite loci. , 2006, Genomics.

[40]  Hedvig Tordai,et al.  Identification and correction of abnormal, incomplete and mispredicted proteins in public databases , 2008, BMC Bioinformatics.

[41]  Ning Hou,et al.  RhesusBase: a knowledgebase for the monkey research community , 2012, Nucleic Acids Res..

[42]  E. Boritz,et al.  Type I interferon responses in rhesus macaques prevent SIV infection and slow disease progression , 2014, Nature.

[43]  Avinash Bhandoola,et al.  Biology Direct , 2006 .

[44]  N. Archidiacono,et al.  Refinement of macaque synteny arrangement with respect to the official rheMac2 macaque sequence assembly , 2008, Chromosome Research.