iCall: a genotype-calling algorithm for rare, low-frequency and common variants on the Illumina exome array

MOTIVATION Next-generation genotyping microarrays have been designed with insights from 1000 Genomes Project and whole-exome sequencing studies. These arrays additionally include variants that are typically present at lower frequencies. Determining the genotypes of these variants from hybridization intensities is challenging because there is less support to locate the presence of the minor alleles when the allele counts are low. Existing algorithms are mainly designed for calling common variants and are notorious for failing to generate accurate calls for low-frequency and rare variants. Here, we introduce a new calling algorithm, iCall, to call genotypes for variants across the whole spectrum of allele frequencies. RESULTS We benchmarked iCall against four of the most commonly used algorithms, GenCall, optiCall, illuminus and GenoSNP, as well as a post-processing caller zCall that adopted a two-stage calling design. Normalized hybridization intensities for 12 370 individuals genotyped on the Illumina HumanExome BeadChip were considered, of which 81 individuals were also whole-genome sequenced. The sequence calls were used to benchmark the accuracy of the genotype calling, and our comparisons indicated that iCall outperforms all four single-stage calling algorithms in terms of call rates and concordance, particularly in the calling accuracy of minor alleles, which is the principal concern for rare and low-frequency variants. The application of zCall to post-process the output from iCall also produced marginally improved performance to the combination of zCall and GenCall. AVAILABILITY AND IMPLEMENTATION iCall is implemented in C++ for use on Linux operating systems and is available for download at http://www.statgen.nus.edu.sg/∼software/icall.html.

[1]  Jing Huang,et al.  Dynamic model based algorithms for screening and genotyping over 100K SNPs on oligonucleotide microarrays , 2005, Bioinform..

[2]  Jean Yee Hwa Yang,et al.  A multi-array multi-SNP genotyping algorithm for Affymetrix SNP microarrays , 2007, Bioinform..

[3]  Michael Inouye,et al.  A genotype calling algorithm for the Illumina BeadArray platform , 2007, Bioinform..

[4]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[5]  Eleni Giannoulatou,et al.  GenoSNP: a variational Bayes within-sample SNP genotyping algorithm that does not require a reference population , 2008, Bioinform..

[6]  Zhaoxia Yu,et al.  Genotype determination for polymorphisms in linkage disequilibrium , 2008, BMC Bioinformatics.

[7]  Zhaoxia Yu,et al.  Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies. , 2009, American journal of human genetics.

[8]  Deanne M. Taylor,et al.  Powerful SNP-set analysis for case-control genome-wide association studies. , 2010, American journal of human genetics.

[9]  Kathryn Roeder,et al.  Testing for an Unusual Distribution of Rare Variants , 2011, PLoS genetics.

[10]  Xihong Lin,et al.  Rare-variant association testing for sequencing data with the sequence kernel association test. , 2011, American journal of human genetics.

[11]  Patrick F. Sullivan,et al.  zCall: a rare variant caller for array-based genotyping: Genetics and population analysis , 2012, Bioinform..

[12]  G. McVean,et al.  Differential confounding of rare and common variants in spatially structured populations , 2011, Nature Genetics.

[13]  James A. Morris,et al.  optiCall: a robust genotype-calling algorithm for rare, low-frequency and common variants , 2012, Bioinform..

[14]  Iuliana Ionita-Laza,et al.  Sequence kernel association tests for the combined effect of rare and common variants. , 2013, American journal of human genetics.