MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes.

We have developed a portable and easily configurable genome annotation pipeline called MAKER. Its purpose is to allow investigators to independently annotate eukaryotic genomes and create genome databases. MAKER identifies repeats, aligns ESTs and proteins to a genome, produces ab initio gene predictions, and automatically synthesizes these data into gene annotations having evidence-based quality indices. MAKER is also easily trainable: Outputs of preliminary runs are used to automatically retrain its gene-prediction algorithm, producing higher-quality gene-models on subsequent runs. MAKER's inputs are minimal, and its outputs can be used to create a GMOD database. Its outputs can also be viewed in the Apollo Genome browser; this feature of MAKER provides an easy means to annotate, view, and edit individual contigs and BACs without the overhead of a database. As proof of principle, we have used MAKER to annotate the genome of the planarian Schmidtea mediterranea and to create a new genome database, SmedGD. We have also compared MAKER's performance to other published annotation pipelines. Our results demonstrate that MAKER provides a simple and effective means to convert a genome sequence into a community-accessible genome database. MAKER should prove especially useful for emerging model organism genome projects for which extensive bioinformatics resources may not be readily available.

[1]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[2]  F. Delsuc Comparative Genomics , 2010, Lecture Notes in Computer Science.

[3]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[4]  Eric Ross,et al.  SmedGD: the Schmidtea mediterranea genome database , 2007, Nucleic Acids Res..

[5]  Keith Bradnam,et al.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes , 2007, Bioinform..

[6]  Robert C. Edgar,et al.  Improved repeat identification and masking in Dipterans. , 2007, Gene.

[7]  Evgeny M. Zdobnov,et al.  VectorBase: a home for invertebrate vectors of human pathogens , 2006, Nucleic Acids Res..

[8]  B. Morgenstern,et al.  AUGUSTUS at EGASP: using EST, protein and genomic alignments for improved gene prediction in the human genome , 2006, Genome Biology.

[9]  Chris Smith,et al.  Large-Scale Trends in the Evolution of Gene Structures within 11 Animal Genomes , 2006, PLoS Comput. Biol..

[10]  Nikos Kyrpides,et al.  The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide , 2005, Nucleic Acids Res..

[11]  Harriet Randolph,et al.  Observations and experiments on regeneration in Planarians , 2015, Archiv für Entwicklungsmechanik der Organismen.

[12]  T. H. Morgan,et al.  Experimental studies of the regeneration of Planaria maculata , 2015, Roux's archives of developmental biology.

[13]  Burkhard Morgenstern,et al.  AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints , 2005, Nucleic Acids Res..

[14]  Mark Yandell,et al.  A computational and experimental approach to validating annotations and gene predictions in the Drosophila melanogaster genome. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[15]  Eugene W. Myers,et al.  PILER : identification and classification of genomic repeats , 2005 .

[16]  Sudhir Kumar,et al.  Comparative Genomics in Eukaryotes , 2005 .

[17]  R. Durbin,et al.  The Sequence Ontology: a tool for the unification of genome annotations , 2005, Genome Biology.

[18]  Ian Holmes,et al.  Stem Stem Stem Stem Loop Loop Loop LoopLoop Loop Loop Loop Loop Loop Loop , 2005 .

[19]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[20]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[21]  E. Birney,et al.  The Ensembl core software libraries. , 2004, Genome research.

[22]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[23]  David E. Konerding,et al.  An Essential Guide to the Basic Local Alignment Search Tool: BLAST , 2004 .

[24]  Michael R. Brent,et al.  Eval: A software package for analysis of genome annotations , 2003, BMC Bioinformatics.

[25]  Ian Korf,et al.  BLAST - an essential guide to the basic local alignment search tool , 2003 .

[26]  E. Birney,et al.  Apollo: a sequence annotation editor , 2002, Genome Biology.

[27]  Phillip A. Newmark,et al.  The Schmidtea mediterranea database as a molecular resource for studying platyhelminthes, stem cells and regeneration , 2002, Development.

[28]  Sofia M. C. Robb,et al.  Identification of immunological reagents for use in the study of freshwater planarians by means of whole‐mount immunofluorescence and confocal microscopy , 2002, Genesis.

[29]  Elena Rivas,et al.  Noncoding RNA gene detection using comparative sequence analysis , 2001, BMC Bioinformatics.

[30]  J. Jurka,et al.  Rolling-circle transposons in eukaryotes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Timothy B. Stockwell,et al.  The Sequence of the Human Genome , 2001, Science.

[32]  M. R. Adams,et al.  Comparative genomics of the eukaryotes. , 2000, Science.

[33]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 , 2000, Nucleic Acids Res..

[34]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[35]  A. Sánchez Alvarado,et al.  Double-stranded RNA specifically disrupts gene expression during planarian regeneration. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[36]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[37]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[38]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.