Aquila_stLFR: assembly based variant calling package for stLFR and hybrid assembly for linked-reads

Human diploid genome assembly enables identifying maternal and paternal genetic variations. Algorithms based on 10x linked-read sequencing have been developed for de novo assembly, variant calling and haplotyping. Another linked-read technology, single tube long fragment read (stLFR), has recently provided a low-cost single tube solution that can enable long fragment data. However, no existing software is available for human diploid assembly and variant calls. We develop Aquila_stLFR to adapt to the key characteristics of stLFR. Aquila_stLFR assembles near perfect diploid assembled contigs, and the assembly-based variant calling shows that Aquila_stLFR detects large numbers of structural variants which were not easily spanned by Illumina short-reads. Furthermore, the hybrid assembly mode Aquila_hybrid allows a hybrid assembly based on both stLFR and 10x linked-reads libraries, demonstrating that these two technologies can always be complementary to each other for assembly to improve contiguity and the variants detection, regardless of assembly quality of the library itself from single sequencing technology. The overlapped structural variants (SVs) from two independent sequencing data of the same individual, and the SVs from hybrid assemblies provide us a high-confidence profile to study them. Availability Source code and documentation are available on https://github.com/maiziex/Aquila_stLFR.

[1]  Radoje Drmanac,et al.  Co-barcoded sequence reads from long DNA fragments: a cost-effective solution for “perfect genome” sequencing , 2015, Front. Genet..

[2]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[3]  N. Weisenfeld,et al.  Direct determination of diploid genome sequences , 2016, bioRxiv.

[4]  Kenny Q. Ye,et al.  Mapping copy number variation by population scale genome sequencing , 2010, Nature.

[5]  Gabor T. Marth,et al.  An integrated map of structural variation in 2,504 human genomes , 2015, Nature.

[6]  Ken Chen,et al.  HySA: A Hybrid Structural variant Assembly approach using next generation and single-molecule sequencing technologies , 2016, bioRxiv.

[7]  Gabor T. Marth,et al.  Haplotype-based variant detection from short-read sequencing , 2012, 1207.3907.

[8]  Michael C. Schatz,et al.  Assemblytics: a web analytics tool for the detection of variants from an assembly , 2016, Bioinform..

[9]  Hanlee P. Ji,et al.  Haplotyping germline and cancer genomes using high-throughput linked-read sequencing , 2015, Nature Biotechnology.

[10]  Chunlin Xiao,et al.  Reproducible integration of multiple sequencing datasets to form high-confidence SNP, indel, and reference calls for five human genome reference materials , 2018, bioRxiv.

[11]  Jian Wang,et al.  Efficient and unique cobarcoding of second-generation sequencing reads from long DNA molecules enabling cost-effective and accurate sequencing, haplotyping, and de novo assembly , 2019, Genome research.

[12]  Anshul Kundaje,et al.  Umap and Bismap: quantifying genome and methylome mappability , 2016, bioRxiv.

[13]  Arend Sidow,et al.  De novo diploid genome assembly for genome-wide structural variant detection , 2019, bioRxiv.

[14]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[15]  Joachim Weischenfeldt,et al.  SvABA: genome-wide detection of structural variants and indels by local assembly , 2018, Genome research.

[16]  Heng Li,et al.  Minimap2: pairwise alignment for nucleotide sequences , 2017, Bioinform..

[17]  Lu Zhang,et al.  HAPDeNovo: a haplotype-based approach for filtering and phasing de novo mutations in linked read sequencing data , 2017, bioRxiv.

[18]  Radoje Drmanac,et al.  Long Fragment Read (LFR) Technology: Cost-Effective, High-Quality Genome-Wide Molecular Haplotyping. , 2017, Methods in molecular biology.

[19]  Misha Angrist,et al.  Personal genomes in progress: from the Human Genome Project to the Personal Genome Project , 2010, Dialogues in clinical neuroscience.

[20]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[21]  Kin-Fan Au,et al.  PacBio Sequencing and Its Applications , 2015, Genom. Proteom. Bioinform..

[22]  David L. Dill,et al.  Aquila: diploid personal genome assembly and comprehensive variant detection based on linked reads , 2019, bioRxiv.

[23]  Arend Sidow,et al.  Assessment of human diploid genome assembly with 10x Linked-Reads data , 2019, bioRxiv.

[24]  Jin-Wu Nam,et al.  The present and future of de novo whole-genome assembly , 2016, Briefings Bioinform..

[25]  V. Bansal,et al.  The importance of phase information for human genomics , 2011, Nature Reviews Genetics.

[26]  J. R. MacDonald,et al.  A copy number variation map of the human genome , 2015, Nature Reviews Genetics.