Tiger Swallowtail Genome Reveals Mechanisms for Speciation and Caterpillar Chemical Defense

Predicting phenotype from genotype represents the epitome of biological questions. Comparative genomics of appropriate model organisms holds the promise of making it possible. However, the high heterozygosity of many Eukaryotes currently prohibits assembling their genomes. Here, we report the 376 Mb genome sequence of Papilio glaucus (Pgl), the first sequenced genome from the Papilionidae family. We obtained the genome from a wild-caught specimen using a cost-effective strategy that overcomes the high (2%) heterozygosity problem. Comparative analyses suggest the molecular bases of various phenotypic traits, including terpene production in the Papilionidae-specific organ, osmeterium. Comparison of Pgl and Papilio canadensis transcriptomes reveals mutation hotspots (4% genes) associated with their divergence: four key circadian clock proteins are enriched in inter-species mutations and likely responsible for the difference in pupal diapause. Finally, the Pgl genome confirms Papilio appalachiensis as a hybrid of Pgl and Pca, but suggests it inherited 3/4 of its genes from Pca.

[1]  Kazuei Mita,et al.  The genome of a lepidopteran model insect, the silkworm Bombyx mori. , 2009, Insect biochemistry and molecular biology.

[2]  Jody Hey,et al.  Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics , 2007, Proceedings of the National Academy of Sciences.

[3]  J. Hey The divergence of chimpanzee species and subspecies as revealed in multipopulation isolation-with-migration analyses. , 2010, Molecular biology and evolution.

[4]  Peter B. McGarvey,et al.  UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..

[5]  J. V. Z. Brower EXPERIMENTAL STUDIES OF MIMICRY IN SOME NORTH AMERICAN BUTTERFLIES. PART III. ***DANAUS GILIPPUS BERENICE AND LIMENITIS ARCHIPPUS FLORIDENSIS , 1958 .

[6]  Dieter Deforce,et al.  Illumina mate-paired DNA sequencing-library preparation using Cre-Lox recombination , 2011, Nucleic acids research.

[7]  T. Préat,et al.  Defining the role of Drosophila lateral neurons in the control of circadian rhythms in motor activity and eclosion by targeted genetic ablation and PERIOD protein overexpression , 2001, The European journal of neuroscience.

[8]  T. Eisner,et al.  Defensive Secretion of a Caterpillar (Papilio) , 1965, Science.

[9]  N. Grishin,et al.  MESSA: MEta-Server for protein Sequence Analysis , 2012, BMC Biology.

[10]  Burkhard Morgenstern,et al.  Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources , 2006, BMC Bioinformatics.

[11]  Laura Ponting,et al.  FlyBase 102—advanced approaches to interrogating FlyBase , 2013, Nucleic Acids Res..

[12]  L. Ayong,et al.  Chemo-Immunotherapeutic Anti-Malarials Targeting Isoprenoid Biosynthesis. , 2013, ACS medicinal chemistry letters.

[13]  Wei Zhang,et al.  doublesex is a mimicry supergene , 2014, Nature.

[14]  Hong Zhang,et al.  Crystal Structure of the Heterodimeric CLOCK:BMAL1 Transcriptional Activator Complex , 2012, Science.

[15]  Lior Pachter,et al.  Identification of novel transcripts in annotated genomes using RNA-Seq , 2011, Bioinform..

[16]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[17]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[18]  Ewan Birney,et al.  Automated generation of heuristics for biological sequence comparison , 2005, BMC Bioinformatics.

[19]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[20]  James O. McInerney,et al.  TOPD/FMTS: a new software to compare phylogenetic trees , 2007, Bioinform..

[21]  Keith Bradnam,et al.  CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes , 2007, Bioinform..

[22]  A. Briscoe,et al.  Six Opsins from the Butterfly Papilio glaucus: Molecular Phylogenetic Evidence for Paralogous Origins of Red-Sensitive Visual Pigments in Insects , 2000, Journal of Molecular Evolution.

[23]  Ke Wang,et al.  genBlastG: using BLAST searches to build homologous gene models , 2011, Bioinform..

[24]  Carl Kingsford,et al.  A fast, lock-free approach for efficient parallel counting of occurrences of k-mers , 2011, Bioinform..

[25]  Cole Trapnell,et al.  TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions , 2013, Genome Biology.

[26]  Colin N. Dewey,et al.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis , 2013, Nature Protocols.

[27]  T. Juenger,et al.  Sex Chromosome Mosaicism and Hybrid Speciation among Tiger Swallowtail Butterflies , 2011, PLoS genetics.

[28]  S. Kaul,et al.  Farnesyl pyrophosphate synthase: a key enzyme in isoprenoid biosynthetic pathway and potential molecular target for drug development. , 2013, New biotechnology.

[29]  Ziheng Yang PAML 4: phylogenetic analysis by maximum likelihood. , 2007, Molecular biology and evolution.

[30]  P. Hardin,et al.  Circadian Rhythm of Temperature Preference and Its Neural Control in Drosophila , 2012, Current Biology.

[31]  David R. Kelley,et al.  Quake: quality-aware detection and correction of sequencing errors , 2010, Genome Biology.

[32]  Sofia M. C. Robb,et al.  MAKER: an easy-to-use annotation pipeline designed for emerging model organism genomes. , 2007, Genome research.

[33]  Michael Ashburner,et al.  Annotation of the Drosophila melanogaster euchromatic genome: a systematic review , 2002, Genome Biology.

[34]  A. Sehgal,et al.  Circadian Control of Eclosion Interaction between a Central and Peripheral Clock in Drosophila melanogaster , 2003, Current Biology.

[35]  Matthew Fraser,et al.  InterProScan 5: genome-scale protein function classification , 2014, Bioinform..

[36]  Patrick Emery,et al.  Cryptochromes Define a Novel Circadian Clock Mechanism in Monarch Butterflies That May Underlie Sun Compass Navigation , 2008, PLoS biology.

[37]  JANE VAN ZANDT BROWER,et al.  Experimental Studies of Mimicry in some North American Butterflies , 1957, Nature.

[38]  Jerzy Jurka,et al.  Censor - a Program for Identification and Elimination of Repetitive Elements From DNA Sequences , 1996, Comput. Chem..

[39]  Thomas Wetter,et al.  Genome Sequence Assembly Using Trace Signals and Additional Sequence Information , 1999, German Conference on Bioinformatics.

[40]  Jian Wang,et al.  A heterozygous moth genome provides insights into herbivory and detoxification , 2013, Nature Genetics.

[41]  J. M. Scriber,et al.  Sex-Linked Diapause, Color, and Allozyme Loci in Papilio glaucus: Linkage Analysis and Significance in a Hybrid Zone , 1989 .

[42]  Scott J. Emrich,et al.  The Evolution of the Anopheles 16 Genomes Project , 2013, G3: Genes, Genomes, Genetics.

[43]  Tetsuya Hayashi,et al.  Efficient de novo assembly of highly heterozygous genomes from whole-genome shotgun short reads , 2014, Genome research.

[44]  J. Truman Hormonal control of insect ecdysis: endocrine cascades for coordinating behavior with physiology. , 2005, Vitamins and hormones.

[45]  Jonathan E. Allen,et al.  Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments , 2007, Genome Biology.

[46]  Matko Bosnjak,et al.  REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms , 2011, PloS one.

[47]  M. Kronforst,et al.  Genome-Wide Characterization of Adaptation and Speciation in Tiger Swallowtail Butterflies Using De Novo Transcriptome Assemblies , 2013, Genome biology and evolution.

[48]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[49]  T. Brody,et al.  Drosophila melanogasterG Protein–Coupled Receptors , 2000, The Journal of cell biology.

[50]  Guang Yang,et al.  DBM-DB: the diamondback moth genome database , 2014, Database J. Biol. Databases Curation.

[51]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[52]  Mark Borodovsky,et al.  GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses , 2005, Nucleic Acids Res..

[53]  María Martín,et al.  Activities at the Universal Protein Resource (UniProt) , 2013, Nucleic Acids Res..

[54]  Simon H. Martin,et al.  Butterfly genome reveals promiscuous exchange of mimicry adaptations among species , 2012, Nature.

[55]  Af Smit,et al.  RepeatMasker software program (computer program), ver. 3.1.8. Seattle: Institute for Systems Biology. , 2007 .

[56]  N. Grishin,et al.  PROMALS3D: a tool for multiple protein sequence and structure alignments , 2008, Nucleic acids research.

[57]  D. Bryant,et al.  A Simple and Robust Statistical Test for Detecting the Presence of Recombination , 2006, Genetics.

[58]  Shuai Zhan,et al.  MonarchBase: the monarch butterfly genome database , 2012, Nucleic Acids Res..

[59]  Dawei Li,et al.  A Draft Sequence for the Genome of the Domesticated Silkworm ( Bombyx mori ) , 2004 .

[60]  P. Holland,et al.  HomeoDB2: functional expansion of a comparative homeobox gene database for evolutionary developmental biology , 2011, Evolution & development.

[61]  Shuai Zhan,et al.  The Monarch Butterfly Genome Yields Insights into Long-Distance Migration , 2011, Cell.

[62]  O. Gascuel,et al.  New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. , 2010, Systematic biology.

[63]  Steven Salzberg,et al.  TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders , 2004, Bioinform..

[64]  H. Magalon,et al.  DNA barcoding cannot reliably identify species of the blowfly genus Protocalliphora (Diptera: Calliphoridae) , 2007, Proceedings of the Royal Society B: Biological Sciences.

[65]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[66]  Brad T. Sherman,et al.  Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources , 2008, Nature Protocols.

[67]  Ian Korf,et al.  Gene finding in novel genomes , 2004, BMC Bioinformatics.

[68]  Ruiqiang Li,et al.  SilkDB v2.0: a platform for silkworm (Bombyx mori ) genome biology , 2009, Nucleic Acids Res..

[69]  F. Allendorf Genetic drift and the loss of alleles versus heterozygosity , 1986 .

[70]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[71]  O Gascuel,et al.  BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data. , 1997, Molecular biology and evolution.

[72]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.