SwiftLink: parallel MCMC linkage analysis using multicore CPU and GPU

MOTIVATION Linkage analysis remains an important tool in elucidating the genetic component of disease and has become even more important with the advent of whole exome sequencing, enabling the user to focus on only those genomic regions co-segregating with Mendelian traits. Unfortunately, methods to perform multipoint linkage analysis scale poorly with either the number of markers or with the size of the pedigree. Large pedigrees with many markers can only be evaluated with Markov chain Monte Carlo (MCMC) methods that are slow to converge and, as no attempts have been made to exploit parallelism, massively underuse available processing power. Here, we describe SWIFTLINK, a novel application that performs MCMC linkage analysis by spreading the computational burden between multiple processor cores and a graphics processing unit (GPU) simultaneously. SWIFTLINK was designed around the concept of explicitly matching the characteristics of an algorithm with the underlying computer architecture to maximize performance. RESULTS We implement our approach using existing Gibbs samplers redesigned for parallel hardware. We applied SWIFTLINK to a real-world dataset, performing parametric multipoint linkage analysis on a highly consanguineous pedigree with EAST syndrome, containing 28 members, where a subset of individuals were genotyped with single nucleotide polymorphisms (SNPs). In our experiments with a four core CPU and GPU, SWIFTLINK achieves a 8.5× speed-up over the single-threaded version and a 109× speed-up over the popular linkage analysis program SIMWALK. AVAILABILITY SWIFTLINK is available at https://github.com/ajm/swiftlink. All source code is licensed under GPLv3.

[1]  E. Thompson,et al.  Estimation of conditional multilocus gene identity among relatives , 1999 .

[2]  R. Elston,et al.  A general model for the genetic analysis of pedigree data. , 1971, Human heredity.

[3]  E. Lander,et al.  Construction of multilocus genetic linkage maps in humans. , 1987, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Arnaud Doucet,et al.  On the Utility of Graphics Cards to Perform Massively Parallel Simulation of Advanced Monte Carlo Methods , 2009, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[5]  A A Schäffer,et al.  Parallelization of general-linkage analysis problems. , 1994, Human heredity.

[6]  Dan Geiger,et al.  Exact genetic linkage computations for general pedigrees , 2002, ISMB.

[7]  E. Thompson,et al.  The recursive derivation of likelihoods on complex pedigrees , 1976, Advances in Applied Probability.

[8]  M. Daly,et al.  Rapid multipoint linkage analysis of recessive traits in nuclear families, including homozygosity mapping. , 1995, American journal of human genetics.

[9]  E A Thompson,et al.  MCMC segregation and linkage analysis , 1997, Genetic epidemiology.

[10]  D E Weeks,et al.  Multipoint Estimation of Identity-by-Descent Probabilities at Arbitrary Positions among Marker Loci on General Pedigrees , 2001, Human Heredity.

[11]  Dan Geiger,et al.  Optimizing exact genetic linkage computations , 2003, RECOMB '03.

[12]  Na Li,et al.  Approaches to mapping genetically correlated complex traits , 2003, BMC Genetics.

[13]  S. Heath Markov chain Monte Carlo segregation and linkage analysis for oligogenic models. , 1997, American journal of human genetics.

[14]  Marc A. Suchard,et al.  Many-core algorithms for statistical phylogenetics , 2009, Bioinform..

[15]  Ellen M Wijsman,et al.  MCMC Multilocus Lod Scores: Application of a New Approach , 2005, Human Heredity.

[16]  K Lange,et al.  A random walk method for computing genetic location scores. , 1991, American journal of human genetics.

[17]  S. Saccaro,et al.  Contents Vol. 21, 2001 , 2001, American Journal of Nephrology.

[18]  Steven J. Plimpton,et al.  Parallel genehunter: implementation of a linkage analysis package for distributed-memory architectures , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[19]  W. Traub Antibiotic Susceptibility of Serratia marcescens and Serratia liquefaciens , 2000, Chemotherapy.

[20]  A A Schäffer,et al.  Integrating parallelization strategies for linkage analysis. , 1995, Computers and biomedical research, an international journal.

[21]  J. Stockman Epilepsy, Ataxia, Sensorineural Deafness, Tubulopathy, and KCNJ10 Mutations , 2011 .

[22]  M Silberstein,et al.  Online system for faster multipoint linkage analysis via parallel execution on thousands of personal computers. , 2006, American journal of human genetics.

[23]  S Lin,et al.  Incorporating crossover interference into pedigree analysis using the chi 2 model. , 1996, Human heredity.

[24]  L Kruglyak,et al.  Parametric and nonparametric linkage analysis: a unified multipoint approach. , 1996, American journal of human genetics.

[25]  Ellen M Wijsman,et al.  Multipoint linkage analysis with many multiallelic or dense diallelic markers: Markov chain-Monte Carlo provides practical approaches for genome scans on general pedigrees. , 2006, American journal of human genetics.

[26]  K Lange,et al.  Descent graphs in pedigree analysis: applications to haplotyping, location scores, and marker-sharing statistics. , 1996, American journal of human genetics.