Software and Methods for Analyzing Molecular Genetic Marker Data

LIU, KEJUN. Software and Methods for Analyzing Molecular Genetic Marker Data (under the direction of DR SPENCER V. MUSE) Genetic analysis of molecular markers has allowed biologists to ask a wide variety of questions. This dissertation explores some aspects of the statistical and computational issues used in the genetic marker data analysis. Chapter 1 gives an introduction to genetic marker data, as well as a brief description to each chapter. Chapter 2 presents the different genetic analyses performed on a large data set and discusses the use of microsatellites to describe the maize germplasm and to improve maize germplasm maintenance. Considerable attention is focused on how the maize germplasm is organized and genetic variation is distributed. A novel maximum likelihood method is developed to estimate the historical contributions for maize inbred lines. Chapter 3 covers a new method for optimal selection of a core set of lines from a large germplasm collection. The simulated annealing algorithm for choosing an optimal k-subset is described and evaluated using the maize germplasm as an example; general constraints are incorporated in the algorithm, and the efficiency of the algorithms is compared to existing methods. Chapter 4 covers a two-stage strategy to partition a chromosomal region into blocks with extensive within-block linkage disequilibrium, and to select the optimal subset of SNPs that essentially captures the haplotype variation within a block. Population simulations suggest that the recursive bisection algorithm for block partitioning is generally reliable for recombination hotspots identification. Maximal entropy theory is applied to choose optimal subset of SNPs. The procedures are evaluated analytically as well as by simulation. The final chapter covers a new software package for genetic marker data analysis. The methods implemented in the package are listed. A brief tutorial is included to illustrate the features of the package. Chapter 5 also describes a new method for estimating population specific F-statistics and an extended algorithm for estimating haplotype frequencies. SOFTWARE AND METHODS FOR ANALYZING MOLECULAR GENETIC MARKER DATA

[1]  L. Cavalli-Sforza,et al.  PHYLOGENETIC ANALYSIS: MODELS AND ESTIMATION PROCEDURES , 1967, Evolution; international journal of organic evolution.

[2]  L. Excoffier,et al.  Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. , 1995, Molecular biology and evolution.

[3]  E. Boerwinkle,et al.  A novel measure of genetic distance for highly polymorphic tandem repeat loci. , 1995, Molecular biology and evolution.

[4]  D J Schaid,et al.  Evaluation of candidate genes in case-control studies: a statistical method to account for related subjects. , 2001, American journal of human genetics.

[5]  R. Hudson,et al.  Statistical properties of the number of recombination events in the history of a sample of DNA sequences. , 1985, Genetics.

[6]  M. Daly,et al.  High-resolution haplotype structure in the human genome , 2001, Nature Genetics.

[7]  L. D. Sanghvi Comparison of genetical and morphological methods for a study of biological differences. , 1953, American journal of physical anthropology.

[8]  H. Hattemer,et al.  Genetic distance between populations , 1982, Theoretical and Applied Genetics.

[9]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[10]  M. Waterman,et al.  A dynamic programming algorithm for haplotype block partitioning , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Richard Judson,et al.  How many SNPs does a genome-wide haplotype map require? , 2002, Pharmacogenomics.

[12]  Kenneth Lange,et al.  Applications of coding theory to the design of somatic cell hybrid panels , 1988, Mathematical Biosciences.

[13]  R. Ward,et al.  Haplotypic analysis of the TNF locus by association efficiency and entropy , 2003, Genome Biology.

[14]  N. Risch,et al.  A comparison of linkage disequilibrium measures for fine-scale mapping. , 1995, Genomics.

[15]  B. Latter Selection in finite populations with multiple alleles. 3. Genetic divergence with centripetal selection and mutation. , 1972, Genetics.

[16]  Fengzhu Sun,et al.  Haplotype block structure and its applications to association studies: power and study designs. , 2002, American journal of human genetics.

[17]  D. Botstein,et al.  Construction of a genetic linkage map in man using restriction fragment length polymorphisms. , 1980, American journal of human genetics.

[18]  M. Nei Molecular Evolutionary Genetics , 1987 .

[19]  B S Weir,et al.  Estimation of the coancestry coefficient: basis for a short-term genetic distance. , 1983, Genetics.

[20]  B. Weir,et al.  A classical setting for associations between markers and loci affecting quantitative traits. , 1999, Genetical research.

[21]  C. E. Ford Human Diversity , 1959, Nature.

[22]  Frank Dudbridge,et al.  Haplotype tagging for the identification of common disease genes , 2001, Nature Genetics.

[23]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[24]  A. Prevosti,et al.  Distances between populations ofDrosophila subobscura, based on chromosome arrangement frequencies , 1975, Theoretical and Applied Genetics.

[25]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[26]  E. Thompson,et al.  Performing the exact test of Hardy-Weinberg proportion for multiple alleles. , 1992, Biometrics.

[27]  Lon R. Cardon,et al.  Efficient selective screening of haplotype tag SNPs , 2003, Bioinform..

[28]  M. Xiong,et al.  Haplotypes vs single marker linkage disequilibrium tests: what do we gain? , 2001, European Journal of Human Genetics.

[29]  D. Clayton,et al.  Choosing a set of haplotype tagging SNPs from a larger set of diallelic loci , 2001 .

[30]  T. Bataillon,et al.  Neutral genetic markers and conservation genetics: simulated germplasm collections. , 1996, Genetics.

[31]  B. S. Weir,et al.  Exact tests for association between alleles at arbitrary numbers of loci , 2005, Genetica.

[32]  B S Weir,et al.  Estimating F-statistics. , 2002, Annual review of genetics.

[33]  M Slatkin,et al.  A measure of population subdivision based on microsatellite allele frequencies. , 1995, Genetics.

[34]  L. Kruglyak Prospects for whole-genome linkage disequilibrium mapping of common disease genes , 1999, Nature Genetics.

[35]  J. Ott Genetic data analysis II , 1997 .

[36]  M W Feldman,et al.  Genetic absolute dating based on microsatellites and the origin of modern humans. , 1995, Proceedings of the National Academy of Sciences of the United States of America.

[37]  L. C. Rutledge,et al.  Genetic Data Analysis , 1991 .

[38]  Carsten Wiuf,et al.  A coalescent model of recombination hotspots. , 2003, Genetics.

[39]  R. Hudson Properties of a neutral allele model with intragenic recombination. , 1983, Theoretical population biology.

[40]  A. Brown,et al.  Core collections: a practical approach to genetic resources management , 1989 .

[41]  Lon R. Cardon,et al.  A first-generation linkage disequilibrium map of human chromosome 22 , 2002, Nature.