HaploMerger: Reconstructing allelic relationships for polymorphic diploid genome assemblies

Whole-genome shotgun assembly has been a long-standing issue for highly polymorphic genomes, and the advent of next-generation sequencing technologies has made the issue more challenging than ever. Here we present an automated pipeline, HaploMerger, for reconstructing allelic relationships in a diploid assembly. HaploMerger combines a LASTZ-ChainNet alignment approach with a novel graph-based structure, which helps to untangle allelic relationships between two haplotypes and guides the subsequent creation of reference haploid assemblies. The pipeline provides flexible parameters and schemes to improve the contiguity, continuity, and completeness of the reference assemblies. We show that HaploMerger produces efficient and accurate results in simulations and has advantages over manual curation when applied to real polymorphic assemblies (e.g., 4%-5% heterozygosity). We also used HaploMerger to analyze the diploid assembly of a single Chinese amphioxus (Branchiostoma belcheri) and compared the resulting haploid assemblies with EST sequences, which revealed that the two haplotypes are not only divergent but also highly complementary to each other. Taken together, we have demonstrated that HaploMerger is an effective tool for analyzing and exploiting polymorphic genome assemblies.

[1]  Jian Wang,et al.  The Genome Sequence of the Malaria Mosquito Anopheles gambiae , 2002, Science.

[2]  Nansheng Chen,et al.  Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences , 2009, Current protocols in bioinformatics.

[3]  Robert S. Harris,et al.  Improved pairwise alignment of genomic dna , 2007 .

[4]  Nilgun Donmez,et al.  Hapsembler: An Assembler for Highly Polymorphic Genomes , 2011, RECOMB.

[5]  David R. Kelley,et al.  Quake: quality-aware detection and correction of sequencing errors , 2010, Genome Biology.

[6]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[7]  Nicholas H. Putnam,et al.  The amphioxus genome and the evolution of the chordate karyotype , 2008, Nature.

[8]  T. Flutre,et al.  Considering Transposable Element Diversification in De Novo Annotation Approaches , 2011, PloS one.

[9]  Liliana Florea,et al.  Sim4db and Leaff: utilities for fast batch spliced alignment and sequence indexing , 2011, Bioinform..

[10]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[11]  D. Haussler,et al.  Evolution's cauldron: Duplication, deletion, and rearrangement in the mouse and human genomes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Sergey Koren,et al.  Aggressive assembly of pyrosequencing reads with mates , 2008, Bioinform..

[13]  Paramvir S. Dehal,et al.  Whole-Genome Shotgun Assembly and Analysis of the Genome of Fugu rubripes , 2002, Science.

[14]  Andrew C. Adey,et al.  Haplotype-resolved genome sequencing of a Gujarati Indian individual , 2011, Nature Biotechnology.

[15]  George Newport,et al.  The diploid genome sequence of Candida albicans. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Matthew M. Hill,et al.  A haplome alignment and reference sequence of the highly polymorphic Ciona savignyi genome , 2007, Genome Biology.

[17]  Alejandro A. Schäffer,et al.  WindowMasker: window-based masker for sequenced genomes , 2006, Bioinform..

[18]  Mihai Pop,et al.  Genome assembly reborn: recent computational challenges , 2009, Briefings Bioinform..

[19]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[20]  Frédéric Delsuc,et al.  Plasticity of Animal Genome Architecture Unmasked by Rapid Evolution of a Pelagic Tunicate , 2010, Science.

[21]  Jill P Mesirov,et al.  Assembly of polymorphic genomes: algorithms and application to Ciona savignyi. , 2005, Genome research.

[22]  Leming Zhou,et al.  Sim4cc: a cross-species spliced alignment program , 2009, Nucleic acids research.

[23]  Cristel G. Thomas,et al.  Detecting heterozygosity in shotgun genome assemblies: Lessons from obligately outcrossing nematodes. , 2008, Genome research.

[24]  Thomas Schiex,et al.  FrameDP: sensitive peptide detection on noisy matured sequences , 2009, Bioinform..

[25]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[26]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[27]  Paul Richardson,et al.  The Draft Genome of Ciona intestinalis: Insights into Chordate and Vertebrate Origins , 2002, Science.