Title Maximum likelihood estimates of two-locus recombinationfractions under some natural inequality restrictions

Background: The goal of linkage analysis is to determine the chromosomal location of the gene(s) for a trait of interest such as a common disease. Three-locus linkage analysis is an important case of multi-locus problems. Solutions can be found analytically for the case of triple backcross mating. However, in the present study of linkage analysis and gene mapping some natural inequality restrictions on parameters have not been considered sufficiently, when the maximum likelihood estimates (MLEs) of the two-locus recombination fractions are calculated. Results: In this paper, we present a study of estimating the two-locus recombination fractions for the phase-unknown triple backcross with two offspring in each family in the framework of some natural and necessary parameter restrictions. A restricted expectation-maximization (EM) algorithm, called REM is developed. We also consider some extensions in which the proposed REM can be taken as a unified method. Conclusion: Our simulation work suggests that the REM performs well in the estimation of recombination fractions and outperforms current method. We apply the proposed method to a published data set of mouse backcross families. Background Molecular genetics has made much progress in recent years, among which linkage analysis fulfills an important role. Genetic linkage refers to the ordering of genetic loci on a chromosome and to estimating genetic distances among them, where these distances are determined on the basis of a statistical phenomenon. Statistical machinery has been used to analyze family data and to detect linkage [1-4]. The degree of linkage can be measured by recombination fraction. The proportion of recombinant haplotypes (or offspring) potentially produced by a doubly heterozygous parent is called recombination fraction, which is also the probability of occurrence of a recombination. Many map functions under different assumptions have been derived [5-7], from which the genetic distance and the recombination fraction can be mutually transformed. Human gene mapping is now an important field of science. A critical first step in finding gene loci that contribute to a genetic trait is to demonstrate linkage with a gene of known location (marker). So estimating the recombination fractions is important in linkage analysis. In several respects, three-locus analysis yields more information than does two-locus analysis [8-11]. Three-locus Published: 4 January 2008 BMC Genetics 2008, 9:1 doi:10.1186/1471-2156-9-1 Received: 11 September 2007 Accepted: 4 January 2008 This article is available from: http://www.biomedcentral.com/1471-2156/9/1 © 2008 Zhou et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Page 1 of 12 (page number not for citation purposes) BMC Genetics 2008, 9:1 http://www.biomedcentral.com/1471-2156/9/1 linkage analysis is also an important case of multi-locus problems. Methods for detecting multilocus linkage in humans and estimation of recombination have been proposed by Lathrop et al. [12], and Lathrop [13]. More recently, Ott [3] has considered the estimation of twolocus recombination fractions for phase-unknown triple backcross families with two offspring in each family. The author gave the presentations of the estimates of the twolocus recombination fractions. Wu et al. [9] considered simultaneous estimation of linkage and linkage phases in outcrossing species. However, as mentioned in Ott [3], the estimates suggested by the author may not satisfy some natural restrictions which two-locus recombination fractions should satisfy in fact. One may not obtain a reasonable interpretation on the recombination phenomenon among loci based on the estimates. Furthermore, illegimate estimates of recombination fractions may also reduce the power to detect linkage which can provide irresponsible evidence to the researchers. In addition, the restrictions on recombination fractions given in the context are necessary in linkage analysis. For example, they can be applied to determine the locus order on the chromosome [9-11]. This estimation problem of two-locus recombination fractions in three-locus linkage analysis belongs to the constrained parameter problems which are not only important but also appear in many areas. The reader is referred to [14-17]. However, the methods provided in the literatures cannot be directly applied to the above genetics problem. Motivated by this unsolved problem that the restrictions on recombination fractions have not been considered sufficiently, in this paper, we consider the estimation of the two-locus recombination fractions under some natural and necessary restrictions. We develop a restricted EM algorithm, called REM, which gives estimating results through taking account of the natural inequality restrictions on the two-locus recombination fractions, and the algorithm has been implemented by computer. Moreover, this algorithm can be easily generalized to other cases, and the REM performs well as a unified approach. Simulation studies show that our new method works well in each scenario and has advantages over current method, in other words, the major advantages of our method is its robustness and efficiency. An example is used to validate the application of our method to linkage analysis. Methods Consider three biallele marker loci, where alleles are designed as A, a; B, b; C, c at loci A, B, C, respectively, with the order of loci being A-B-C. Assume a triply homozygous parent abc/abc, and a triply heterozygous parent (A/a, B/b, C/c). For the latter, there are four possible phases: (I) ABC/abc, (II) ABc/abC, (III) AbC/aBc, (IV) Abc/aBC. As Ott [3] pointed out, under regular conditions (linkage equilibrium), each of these phases occurs with probability 1/4. When it is not the case, we let the prior probability be hi (i = 1, 2, 3, 4) in a later section, and give corresponding feasible approach. Each offspring only receives haplotype abc from the triply homozygous parent, but receives one of the eight possible kinds of haplotypes from the heterozygous parent, which can be seen at the second column of Table 1. The last four columns of Table 1 give the conditional probabilities with which the offspring phenotypes occur given the parental phase, and the first column presents the code for each haplotype that we will use. For the phase-unknown triple backcross, each haplotype symbol listed in Table 1 just corresponds to one offspring phenotype of the markers. Let θAB, θBC and θAC, respectively denote two-locus recombination fractions between loci A and B, between loci B and C, and between loci A and C; g00, g01, g10 and g11 Table 1: Conditional haplotype probabilities given phase produced by a triply heterozygous parent

[1]  Rongling Wu,et al.  Statistical Genetics of Quantitative Traits: Linkage, Maps and QTL , 2007 .

[2]  Zehua Chen The Full EM Algorithm for the MLEs of QTL Effects and Positions and Their Estimated Variances in Multiple‐Interval Mapping , 2005, Biometrics.

[3]  Ning-Zhong Shi,et al.  The restricted EM algorithm under inequality restrictions on the parameters , 2005 .

[4]  Rongling Wu,et al.  A multilocus likelihood approach to joint modeling of linkage, parental diplotype and gene order in a full-sib family , 2004, BMC Genetics.

[5]  Elizabeth A. Thompson,et al.  Statistical inference from genetic data on pedigrees , 2003 .

[6]  Rongling Wu,et al.  Simultaneous maximum likelihood estimation of linkage and linkage phases in outcrossing species. , 2002, Theoretical population biology.

[7]  Chuanhai Liu Estimation of Discrete Distributions with a Class of Simplex Constraints , 2000 .

[8]  K. Richardson,et al.  Genetic control of susceptibility to UV-induced immunosuppression by interacting quantitative trait loci , 2000, Genes and Immunity.

[9]  Z. Zeng,et al.  Multiple interval mapping for quantitative trait loci. , 1999, Genetics.

[10]  R. Jansen,et al.  University of Groningen High Resolution of Quantitative Traits Into Multiple Loci via Interval Mapping , 2022 .

[11]  Mokhtar S. Bazaraa,et al.  Nonlinear Programming: Theory and Algorithms , 1993 .

[12]  N. Risch Linkage strategies for genetically complex traits. I. Multilocus models. , 1990, American journal of human genetics.

[13]  E. Lander,et al.  Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. , 1989, Genetics.

[14]  F. T. Wright,et al.  Order restricted statistical inference , 1988 .

[15]  J. Ott,et al.  Multilocus linkage analysis in humans: detection of linkage and estimation of recombination. , 1985, American journal of human genetics.

[16]  J. Ott,et al.  Strategies for multilocus linkage analysis in humans. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[17]  E A Thompson,et al.  Information gain in joint linkage analysis. , 1984, IMA journal of mathematics applied in medicine and biology.

[18]  R. Dykstra An Algorithm for Restricted Least Squares Regression , 1983 .

[19]  J. Felsenstein A mathematically tractable family of genetic mapping functions with different amounts of interference. , 1979, Genetics.

[20]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .