Lep-MAP3: robust linkage mapping even for low-coverage whole genome sequencing data

Motivation Accurate and dense linkage maps are useful in family-based linkage and association studies, quantitative trait locus mapping, analysis of genome synteny and other genomic data analyses. Moreover, linkage mapping is one of the best ways to detect errors in de novo genome assemblies, as well as to orient and place assembly contigs within chromosomes. A small mapping cross of tens of individuals will detect many errors where distant parts of the genome are erroneously joined together. With more individuals and markers, even more local errors can be detected and more contigs can be oriented. However, the tools that are currently available for constructing linkage maps are not well suited for large, possible low-coverage, whole genome sequencing datasets. Results Here we present a linkage mapping software Lep-MAP3, capable of mapping high-throughput whole genome sequencing datasets. Such data allows cost-efficient genotyping of millions of single nucleotide polymorphisms (SNPs) for thousands of individual samples, enabling, among other analyses, comprehensive validation and refinement of de novo genome assemblies. The algorithms of Lep-MAP3 can analyse low-coverage datasets and reduce data filtering and curation on any data. This yields more markers in the final maps with less manual work even on problematic datasets. We demonstrate that Lep-MAP3 obtains very good performance already on 5x sequencing coverage and outperforms the fastest available software on simulated data on accuracy and often on speed. We also construct de novo linkage maps on 7-12x whole-genome data on the Red postman butterfly (Heliconius erato) with almost 3 million markers. Availability and implementation Lep-MAP3 is available with the source code under GNU general public license from http://sourceforge.net/projects/lep-map3. Contact pasi.rastas@helsinki.fi. Supplementary information Supplementary data are available at Bioinformatics online.

[1]  J. V. Ooijen,et al.  Multipoint maximum likelihood mapping in a full-sib family of an outbreeding species. , 2011 .

[2]  Jo L. Dicks,et al.  Computational approaches and software tools for genetic linkage map estimation in plants , 2009, Briefings Bioinform..

[3]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[4]  Christoph Lange,et al.  Family-based methods for linkage and association analysis. , 2008, Advances in genetics.

[5]  Liisa Holm,et al.  The Glanville fritillary genome retains an ancient karyotype and reveals selective chromosomal fusions in Lepidoptera , 2014, Nature Communications.

[6]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[7]  Stefano Lonardi,et al.  Efficient and Accurate Construction of Genetic Linkage Maps from the Minimum Spanning Tree of a Graph , 2008, PLoS genetics.

[8]  Pasi Rastas,et al.  Construction of Ultradense Linkage Maps with Lep-MAP2: Stickleback F2 Recombinant Crosses as an Example , 2015, Genome biology and evolution.

[9]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[10]  Janna L. Fierst,et al.  Using linkage maps to correct and scaffold de novo genome assemblies: methods, challenges, and computational tools , 2015, Front. Genet..

[11]  M. Pop,et al.  The Theory and Practice of Genome Sequence Assembly. , 2015, Annual review of genomics and human genetics.

[12]  Petri Auvinen,et al.  Lep-MAP: fast and accurate linkage map construction for large SNP datasets , 2013, Bioinform..

[13]  Trevor Paterson,et al.  ArkMAP: integrating genomic maps across species and data sources , 2013, BMC Bioinformatics.

[14]  Xiaowu Wang,et al.  Construction and Analysis of High-Density Linkage Map Using High-Throughput Sequencing Data , 2014, PloS one.

[15]  N. Morton Sequential tests for the detection of linkage. , 1955, American journal of human genetics.

[16]  R. Doerge Multifactorial genetics: Mapping and analysis of quantitative trait loci in experimental populations , 2002, Nature Reviews Genetics.

[17]  Camilo Salazar,et al.  Complex modular architecture around a simple toolkit of wing pattern genes , 2017, Nature Ecology &Evolution.

[18]  E. Lander,et al.  Construction of multilocus genetic linkage maps in humans. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Jaana M. Hartikainen,et al.  MicroRNA Related Polymorphisms and Breast Cancer Risk , 2014, PloS one.

[20]  M. Blaxter,et al.  Lepbase: the Lepidopteran genome database , 2016, bioRxiv.

[21]  Murray Patterson,et al.  Lateral gene transfer, rearrangement, reconciliation , 2013, BMC Bioinformatics.