Conservative Extensions of Linkage Disequilibrium Measures from Pairwise to Multi-loci and Algorithms for Optimal Tagging SNP Selection

We present results on two classes of problems. The first result addresses the long standing open problem of finding unifying principles for Linkage Disequilibrium (LD) measures in population genetics (Lewontin 1964 [10], Hedrick 1987 [8], Devlin and Risch 1995 [5]). Two desirable properties have been proposed in the extensive literature on this topic and the mutual consistency between these properties has remained at the heart of statistical and algorithmic difficulties with haplotype and genome-wide association study analysis. The first axiom is (1) The ability to extend LD measures to multiple loci as a conservative extension of pairwise LD. All widely used LD measures are pairwise measures. Despite significant attempts, it is not clear how to naturally extend these measures to multiple loci, leading to a "curse of the pairwise". The second axiom is (2) The Interpretability of Intermediate Values. In this paper, we resolve this mutual consistency problem by introducing a new LD measure, directed informativeness I (the directed graph theoretic counterpart of the informativeness measure introduced by Halldorsson et al. [6]) and show that it satisfies both of the above axioms. We also show the maximum informative subset of tagging SNPs based on I can be computed exactly in polynomial time for realistic genome-wide data. Furthermore, we present polynomial time algorithms for optimal genome-wide tagging SNPs selection for a number of commonly used LD measures, under the bounded neighborhood assumption for linked pairs of SNPs. One problem in the area is the search for a quality measure for tagging SNPs selection that unifies the LD-based methods such as LD-select (implemented in Tagger, de Bakker et al. 2005 [4], Carlson et al. 2004 [3]) and the information-theoretic ones such as informativeness. We show that the objective function of the LD-select algorithm is the Minimal Dominating Set (MDS) on r2-SNP graphs and show that we can compute MDS in polynomial time for this class of graphs. Although in LD-select the "maximally informative" solution is obtained through a greedy algorithm, and therefore better referred to as "locally maximally informative," we show that in fact, Tagger (LD-select) performs very close to the global maximally informative optimum.

[1]  Cristian S. Calude,et al.  Discrete Mathematics and Theoretical Computer Science , 2003, Lecture Notes in Computer Science.

[2]  R. Lewontin,et al.  On measures of gametic disequilibrium. , 1988, Genetics.

[3]  Russell Schwartz,et al.  Haplotypes and informative SNP selection algorithms: don't block out information , 2003, RECOMB '03.

[4]  Russell Schwartz,et al.  Methods for Inferring Block-Wise Ancestral History from Haploid Sequences , 2002, WABI.

[5]  Russell Schwartz,et al.  SNPs Problems, Complexity, and Algorithms , 2001, ESA.

[6]  J. Pritchard,et al.  Linkage disequilibrium in humans: models and data. , 2001, American journal of human genetics.

[7]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[8]  L. Kruglyak,et al.  Patterns of linkage disequilibrium in the human genome , 2002, Nature Reviews Genetics.

[9]  P. Hedrick,et al.  Gametic disequilibrium measures: proceed with caution. , 1987, Genetics.

[10]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[11]  N. Risch,et al.  A comparison of linkage disequilibrium measures for fine-scale mapping. , 1995, Genomics.

[12]  Shibu Yooseph,et al.  Combinatorial Problems Arising in SNP and Haplotype Analysis , 2003, DMTCS.

[13]  Russell Schwartz,et al.  Optimal Haplotype Block-free Selection of Tagging Snps for Genome-wide Association Studies , 2022 .

[14]  S. Gabriel,et al.  Efficiency and power in genetic association studies , 2005, Nature Genetics.

[15]  Friedhelm Meyer auf der Heide,et al.  Algorithms — ESA 2001 , 2001, Lecture Notes in Computer Science.

[16]  Russell Schwartz,et al.  Robustness of Inference of Haplotype Block Structure , 2003, J. Comput. Biol..