Automated Update, Revision, and Quality Control of the Maize Genome Annotations Using MAKER-P Improves the B73 RefGen_v3 Gene Models and Identifies New Genes1[OPEN]

Reannotation of the maize genome using MAKER-P results in many revised and new gene models. The large size and relative complexity of many plant genomes make creation, quality control, and dissemination of high-quality gene structure annotations challenging. In response, we have developed MAKER-P, a fast and easy-to-use genome annotation engine for plants. Here, we report the use of MAKER-P to update and revise the maize (Zea mays) B73 RefGen_v3 annotation build (5b+) in less than 3 h using the iPlant Cyberinfrastructure. MAKER-P identified and annotated 4,466 additional, well-supported protein-coding genes not present in the 5b+ annotation build, added additional untranslated regions to 1,393 5b+ gene models, identified 2,647 5b+ gene models that lack any supporting evidence (despite the use of large and diverse evidence data sets), identified 104,215 pseudogene fragments, and created an additional 2,522 noncoding gene annotations. We also describe a method for de novo training of MAKER-P for the annotation of newly sequenced grass genomes. Collectively, these results lead to the 6a maize genome annotation and demonstrate the utility of MAKER-P for rapid annotation, management, and quality control of grasses and other difficult-to-annotate plant genomes.

[1]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[2]  Katherine E. Guill,et al.  A Genome-Wide Characterization of MicroRNA Genes in Maize , 2009, PLoS genetics.

[3]  Mark Yandell,et al.  MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects , 2011, BMC Bioinformatics.

[4]  S. Wessler,et al.  Treasures in the attic: Rolling circle transposons discovered in eukaryotic genomes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Yasuko Takahashi,et al.  Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events , 2022 .

[6]  E. Birney,et al.  EGASP: the human ENCODE Genome Annotation Assessment Project , 2006, Genome Biology.

[7]  Guillaume Blanc,et al.  Functional Divergence of Duplicated Genes Formed by Polyploidy during Arabidopsis Evolution , 2004, The Plant Cell Online.

[8]  Dan M. Bolser,et al.  Gramene 2013: comparative plant genomics resources , 2013, Nucleic Acids Res..

[9]  David Haussler,et al.  Using native and syntenically mapped cDNA alignments to improve de novo gene finding , 2008, Bioinform..

[10]  S. Rhee,et al.  MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes. , 2004, The Plant journal : for cell and molecular biology.

[11]  Richard M. Clark,et al.  The Arabidopsis lyrata genome sequence and the basis of rapid genome size change , 2011, Nature Genetics.

[12]  Volker Brendel,et al.  MaizeGDB becomes ‘sequence-centric’ , 2009, Database J. Biol. Databases Curation.

[13]  Ning Jiang,et al.  Pack-Mutator–like transposable elements (Pack-MULEs) induce directional modification of genes through biased insertion and DNA acquisition , 2011, Proceedings of the National Academy of Sciences.

[14]  L. Stein,et al.  Evidence-based gene predictions in plant genomes. , 2009, Genome research.

[15]  Mario Stanke,et al.  Gene prediction with a hidden Markov model and a new intron submodel , 2003, ECCB.

[16]  Stijn van Dongen,et al.  miRBase: tools for microRNA genomics , 2007, Nucleic Acids Res..

[17]  Melissa D. Lehti-Shiu,et al.  Importance of Lineage-Specific Expansion of Plant Tandem Duplicates in the Adaptive Response to Environmental Stimuli1[W][OA] , 2008, Plant Physiology.

[18]  Lixing Yang,et al.  Distribution, diversity, evolution, and survival of Helitrons in the maize genome , 2009, Proceedings of the National Academy of Sciences.

[19]  D. Bartel,et al.  Criteria for Annotation of Plant MicroRNAs , 2008, The Plant Cell Online.

[20]  T. Graves,et al.  The Physical and Genetic Framework of the Maize B73 Genome , 2009, PLoS genetics.

[21]  P. Schnable,et al.  Ontogeny of the Maize Shoot Apical Meristem[W][OA] , 2012, Plant Cell.

[22]  Jikai Lei,et al.  miR-PREFeR: an accurate, fast and easy-to-use plant miRNA prediction tool using small RNA-Seq data , 2014, Bioinform..

[23]  Edward S. Buckler,et al.  Gramene database in 2010: updates and extensions , 2010, Nucleic Acids Res..

[24]  B. Gaut,et al.  DNA sequence evidence for the segmental allotetraploid origin of maize. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[25]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.

[26]  Wen-Hsiung Li,et al.  Transcription Factor Families Have Much Higher Expansion Rates in Plants than in Animals1 , 2005, Plant Physiology.

[27]  David L. Wheeler,et al.  GenBank , 2015, Nucleic Acids Res..

[28]  Le-Shin Wu,et al.  Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies , 2014, Genome Biology.

[29]  N. Darzentas,et al.  MASiVEdb: the Sirevirus Plant Retrotransposon Database , 2012, BMC Genomics.

[30]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[31]  Melissa D. Lehti-Shiu,et al.  Evolutionary and Expression Signatures of Pseudogenes in Arabidopsis and Rice1[C][W][OA] , 2009, Plant Physiology.

[32]  M. Yandell,et al.  A beginner's guide to eukaryotic genome annotation , 2012, Nature Reviews Genetics.

[33]  F. Qu,et al.  Arabidopsis DRB4, AGO1, AGO7, and RDR6 participate in a DCL4-initiated antiviral RNA silencing pathway negatively regulated by DCL1 , 2008, Proceedings of the National Academy of Sciences.

[34]  Cathal Seoighe,et al.  Genome duplication led to highly selective expansion of the Arabidopsis thaliana proteome. , 2004, Trends in genetics : TIG.

[35]  Dawn H. Nagel,et al.  The B73 Maize Genome: Complexity, Diversity, and Dynamics , 2009, Science.

[36]  Steven Salzberg,et al.  DAGchainer: a tool for mining segmental genome duplications and synteny , 2004, Bioinform..

[37]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[38]  Carolyn J. Lawrence-Dill,et al.  MAKER-P: A Tool Kit for the Rapid Creation, Management, and Quality Control of Plant Genome Annotations1[W][OPEN] , 2013, Plant Physiology.

[39]  Hans A. Vasquez-Gross,et al.  Unique Features of the Loblolly Pine (Pinus taeda L.) Megagenome Revealed Through Sequence Annotation , 2014, Genetics.

[40]  S. Eddy,et al.  A computational screen for methylation guide snoRNAs in yeast. , 1999, Science.

[41]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[42]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[43]  Chunguang Du,et al.  The polychromatic Helitron landscape of the maize genome , 2009, Proceedings of the National Academy of Sciences.

[44]  Ana Kozomara,et al.  miRBase: annotating high confidence microRNAs using deep sequencing data , 2013, Nucleic Acids Res..

[45]  Karen Eilbeck,et al.  Quantitative measures for the management and comparison of annotated genomes , 2009, BMC Bioinformatics.