Titre : Title : PSEUDOMARKER 2 . 0 : efficient computation of likelihoods using NOMAD Auteurs :

Background: PSEUDOMARKER is a software package that performs joint linkage and linkage disequilibrium analysis between a marker and a putative disease locus. A key feature of PSEUDOMARKER is that it can combine case-controls and pedigrees of varying structure into a single unified analysis. Thus it maximizes the full likelihood of the data over marker allele frequencies or conditional allele frequencies on disease and recombination fraction. Results: The new version 2.0 uses the software package NOMAD to maximize likelihoods, resulting in generally comparable or better optima with many fewer evaluations of the likelihood functions. Conclusions: After being modified substantially to use modern optimization methods, PSEUDOMARKER version 2.0 is more robust and substantially faster than version 1.0. NOMAD may be useful in other bioinformatics problems where complex likelihood functions are optimized. Background PSEUDOMARKER [1] is a package that genomically localizes trait-predisposing loci by performing statistical tests using a putative disease locus and a series of markers. Genomic localization of genes that impact some phenotype is based on tests of independence of disease phenotypes from genotypes of a genome-spanning set of markers. Many “association tests” try to test directly for statistical relationships between disease phenotypes and marker genotypes directly by sampling large numbers of cases and controls or very small families. Such tests confound the statistical relationship between marker alleles and the genotypes at a putative nearby disease locus with the statistical relationship between the same markers and the phenotype. This confounding is unavoidable for casecontrol data because of the limited degrees of freedom, but these relationships can and should be modeled explicitly when analyzing more complex and heterogeneous pedigree sets. PSEUDOMARKER performs a full likelihood analysis under a specified model of the relationship between *Correspondence: gertz@ncbi.nlm.nih.gov 1National Center for Biotechnology Information, NIH, DHHS, Bethesda, MD, USA Full list of author information is available at the end of the article disease phenotypes and underlying genotypes. In pedigree data, one can test for genetic linkage as the preferential cosegregation of a marker or a haplotype with disease family-by-family; the marker genotype that cosegregates with the disease can differ from family to family. In either pedigree data or in case-control data, one can test for linkage disequilibrium (LD) between a marker and a putative disease locus as the preferential co-occurrence of a specific genotype at the marker with a genotype at the disease locus. By using a full likelihood model, PSEUDOMARKER can combine analysis of case-control (singletons) data and pedigree data of arbitrary size in one unified testing framework. We directly analyze linkage and LD among marker and disease genotypes, integrating over all possible genotypes at the putative two-allele disease-predisposing locus, for all individuals under an explicit model of the genotype-phenotype relationship. PSEUDOMARKER version 1 maximizes several likelihood functions [1] using a generalized pattern search (GPS) algorithm [2] implemented in a custom version of the ILINK [3] program. Previously, we showed that PSEUDOMARKER, using GPS likelihood estimates, performed well in detecting linkage and LD, outperforming several competing genetic analysis programs as measured by the power or false positive rate [4]. © 2014 Gertz et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Gertz et al. BMC Bioinformatics 2014, 15:47 Page 2 of 8 http://www.biomedcentral.com/1471-2105/15/47 The running time of PSEUDOMARKER depends on the number of times the optimization algorithm evaluates any likelihood function. Each evaluation involves computation over one or, often, several pedigrees for fixed values of certain parameters that may include the recombination fraction and marker allele frequencies. ILINK computes these likelihoods using a peeling method that is a generalization of the Elston-Stewart algorithm [5]. Computation time is highly dependent on the pedigree structure and the number of untyped founders. A reduction in the number of likelihood function evaluations would allow more samples, larger and more complex pedigrees, or a greater density of markers to be analyzed in a reasonable amount of time. Although the GPS method [2] was more robust than the older line search method implemented in all previous versions of ILINK, we decided that the number of likelihood evaluationsmight be reduced by using instead a newer algorithm known to outperform GPS in some other optimization problems. Mesh Adaptive Direct Search (MADS) [6] is a framework for a class of derivative-free algorithms designed to supersede the GPS method. MADS is conceptually similar to GPS, but uses a richer set of search directions, resulting in better theoretical convergence properties. The NOMAD software package [7] is a high-quality, C++ open-source implementation of MADS algorithms in use in universities and companies around the world [8-11]. NOMAD is robust [12] and has a wide range of functionality, including handling of general nonlinear constraints, biobjective optimization, parallelism, and the restriction of variables to integer or boolean values [13] . We describe PSEUDOMARKER 2.0, which uses a customized version of ILINK that uses NOMAD tomaximize likelihoods. We show that NOMAD is more effective at finding optima than GPS, while requiring fewer evaluations of the likelihood function.

[1]  Markus Perola,et al.  Identifying flavor preference subgroups. Genetic basis and related eating behavior traits , 2014, Appetite.

[2]  Giuseppe Nicosia,et al.  Semiconductor device design using the BiMADS algorithm , 2013, J. Comput. Phys..

[3]  David Goldman,et al.  A large-scale candidate gene analysis of mood disorders: evidence of neurotrophic tyrosine kinase receptor and opioid receptor signaling dysfunction , 2013, Psychiatric genetics.

[4]  Louis J. Muglia,et al.  A Potential Novel Spontaneous Preterm Birth Gene, AR, Identified by Linkage and Association Analysis of X Chromosomal Markers , 2012, PloS one.

[5]  Tero Hiekkalinna,et al.  On the Superior Power of Likelihood-based Linkage Disequilibrium Mapping in Large Multiplex Families Compared to Population Based Case-control Designs , 2012 .

[6]  K. S. Thorne,et al.  Einstein@Home all-sky search for periodic gravitational waves in LIGO S5 data , 2012, Physical Review D.

[7]  Hannes Lohi,et al.  A SEL1L Mutation Links a Canine Progressive Early-Onset Cerebellar Ataxia to the Endoplasmic Reticulum–Associated Protein Degradation (ERAD) Machinery , 2012, PLoS genetics.

[8]  Alejandro A Schäffer,et al.  On the statistical properties of family-based association tests in datasets containing both pedigrees and unrelated case–control samples , 2011, European Journal of Human Genetics.

[9]  Alejandro A. Schäffer,et al.  PSEUDOMARKER: A Powerful Program for Joint Linkage and/or Linkage Disequilibrium Analysis on Mixtures of Singletons and Related Individuals , 2011, Human Heredity.

[10]  Mathieu Lemire,et al.  Coordinated Conditional Simulation with SLINK and SUP of Many Markers Linked or Associated to a Trait in Large Pedigrees , 2011, Human Heredity.

[11]  Sébastien Le Digabel,et al.  Use of quadratic models with mesh-adaptive direct search for constrained black box optimization , 2011, Optim. Methods Softw..

[12]  Vincent Garnier,et al.  Snow Water Equivalent Estimation Using Blackbox Optimization , 2011 .

[13]  A. Palotie,et al.  A visual migraine aura locus maps to 9q21-q22 , 2010, Neurology.

[14]  Charles Audet,et al.  OrthoMADS: A Deterministic MADS Instance with Orthogonal Directions , 2008, SIAM J. Optim..

[15]  Sudha Seshadri,et al.  Framingham Heart Study 100K project: genome-wide associations for cardiovascular disease outcomes , 2007, BMC Medical Genetics.

[16]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[17]  Jaakko Kaprio,et al.  Twin studies in Finland 2006. , 2006, Twin research and human genetics : the official journal of the International Society for Twin Studies.

[18]  Charles Audet,et al.  Nonsmooth optimization through Mesh Adaptive Direct Search and Variable Neighborhood Search , 2006, J. Glob. Optim..

[19]  Gonçalo R. Abecasis,et al.  PEDSTATS: descriptive statistics, graphics and quality assessment for gene mapping data , 2005, Bioinform..

[20]  Aarno Palotie,et al.  Chromosome 19p13 loci in Finnish migraine with aura families , 2005, American journal of medical genetics. Part B, Neuropsychiatric genetics : the official publication of the International Society of Psychiatric Genetics.

[21]  J. Kaprio,et al.  Genetic and Environmental Factors in Health-related Behaviors: Studies on Finnish Twins and Twin Families , 2002, Twin Research.

[22]  Jeanette C Papp,et al.  A susceptibility locus for migraine with aura, on chromosome 4q24. , 2002, American journal of human genetics.

[23]  J S Sinsheimer,et al.  Chromosome 1 loci in Finnish schizophrenia families. , 2001, Human molecular genetics.

[24]  K. Roeder,et al.  A SAS Procedure Based on Mixture Models for Estimating Developmental Trajectories , 2001 .

[25]  H H Göring,et al.  Linkage analysis in the presence of errors IV: joint pseudomarker analysis of linkage and/or linkage disequilibrium on a mixture of pedigrees and singletons when the mode of inheritance cannot be accurately specified. , 2000, American journal of human genetics.

[26]  J D Terwilliger,et al.  Genomewide scan for familial combined hyperlipidemia genes in finnish families, suggesting multiple susceptibility loci influencing triglyceride, cholesterol, and apolipoprotein B levels. , 1999, American journal of human genetics.

[27]  J R O'Connell,et al.  PedCheck: a program for identification of genotype incompatibilities in linkage analysis. , 1998, American journal of human genetics.

[28]  Jurg Ott,et al.  Handbook of Human Genetic Linkage , 1994 .

[29]  A A Schäffer,et al.  Faster sequential genetic linkage computations. , 1993, American journal of human genetics.

[30]  John E. Dennis,et al.  Direct Search Methods on Parallel Machines , 1991, SIAM J. Optim..

[31]  J Ott,et al.  Counting methods (EM algorithm) in human pedigree analysis: Linkage and segregation analysis , 1977, Annals of human genetics.

[32]  R. Elston,et al.  A general model for the genetic analysis of pedigree data. , 1971, Human heredity.

[33]  Jean-Baptiste Hiriart-Urruty,et al.  Optimal, Environmentally Friendly Departure Procedures for Civil Aircraft , 2011 .

[34]  Sébastien Le Digabel,et al.  Algorithm xxx : NOMAD : Nonlinear Optimization with the MADS algorithm , 2010 .

[35]  Charles Audet,et al.  Mesh Adaptive Direct Search Algorithms for Constrained Optimization , 2006, SIAM J. Optim..

[36]  J. Terwilliger,et al.  A haplotype-based 'haplotype relative risk' approach to detecting allelic associations. , 1992, Human heredity.

[37]  D E Weeks,et al.  Trials, tribulations, and triumphs of the EM algorithm in pedigree analysis. , 1989, IMA journal of mathematics applied in medicine and biology.

[38]  R C Elston,et al.  Age trends in human chiasma frequencies and recombination fractions. II. Method for analyzing recombination fractions and applications to the ABO:nail-patella linkage. , 1976, American journal of human genetics.