Utilizing Graph Theory to Select the Largest Set of Unrelated Individuals for Genetic Analysis

Many statistical analyses of genetic data rely on the assumption of independence among samples. Consequently, relatedness is either modeled in the analysis or samples are removed to “clean” the data of any pairwise relatedness above a tolerated threshold. Current methods do not maximize the number of unrelated individuals retained for further analysis, and this is a needless loss of resources. We report a novel application of graph theory that identifies the maximum set of unrelated samples in any dataset given a user‐defined threshold of relatedness as well as all networks of related samples. We have implemented this method into an open source program called Pedigree Reconstruction and Identification of a Maximum Unrelated Set, PRIMUS. We show that PRIMUS outperforms the three existing methods, allowing researchers to retain up to 50% more unrelated samples. A unique strength of PRIMUS is its ability to weight the maximum clique selection using additional criteria (e.g. affected status and data missingness). PRIMUS is a permanent solution to identifying the maximum number of unrelated samples for a genetic analysis.

[1]  Jasmin Divers,et al.  Population Structure of Hispanics in the United States: The Multi-Ethnic Study of Atherosclerosis , 2012, PLoS genetics.

[2]  Shuying Sun,et al.  Statistical human genetics : methods and protocols , 2012 .

[3]  Lei Sun,et al.  Identifying cryptic relationships. , 2012, Methods in molecular biology.

[4]  Mark Abney,et al.  Identity by descent estimation with dense genome‐wide genotype data , 2011, Genetic epidemiology.

[5]  J. Ott,et al.  Family-based designs for genome-wide association studies , 2011, Nature Reviews Genetics.

[6]  B. Browning,et al.  A fast, powerful method for detecting identity by descent. , 2011, American journal of human genetics.

[7]  Jinchuan Xing,et al.  Maximum-likelihood estimation of recent shared ancestry (ERSA). , 2011, Genome research.

[8]  Josyf Mychaleckyj,et al.  Robust relationship inference in genome-wide association studies , 2010, Bioinform..

[9]  Chaolong Wang,et al.  Inference of unexpected genetic relatedness among individuals in HapMap Phase III. , 2010, American journal of human genetics.

[10]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[11]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[12]  Brian L. Browning,et al.  High-resolution detection of identity by descent in unrelated individuals. , 2010, American journal of human genetics.

[13]  H. Kang,et al.  Variance component model to account for sample structure in genome-wide association studies , 2010, Nature Genetics.

[14]  Mary Sara McPeek,et al.  ROADTRIPS: case-control association testing with partially or completely unknown population and pedigree structure. , 2010, American journal of human genetics.

[15]  Erin E. Carlson,et al.  Dense genome-wide SNP linkage scan in 301 hereditary prostate cancer families identifies multiple regions with suggestive evidence for linkage. , 2009, Human molecular genetics.

[16]  S. Heath,et al.  Investigation of the fine structure of European populations with applications to disease association studies , 2008, European Journal of Human Genetics.

[17]  Frédéric Cazals,et al.  A note on the problem of reporting maximal cliques , 2008, Theor. Comput. Sci..

[18]  Pall I. Olason,et al.  Detection of sharing by descent, long-range phasing and haplotype imputation , 2008, Nature Genetics.

[19]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[20]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[21]  J. Pritchard,et al.  Confounding from Cryptic Relatedness in Case-Control Association Studies , 2005, PLoS genetics.

[22]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[23]  C. Bron,et al.  Algorithm 457: finding all cliques of an undirected graph , 1973 .