REGENS: an open source Python package for simulating realistic autosomal genotypes

REcombinatory Genome ENumeration of Subpopulations (REGENS) is an open source Python package that simulates autosomal genotypes by concatenating real individuals’ genomic segments in a way that preserves their linkage disequilibrium (LD), which is defined as statistical associations between alleles at different loci (Slatkin, 2008). Recombining segments in a way that preserves LD simulates autosomes that closely resemble those of the real input population (Shi, 2018) because real autosomal genotypes can be accurately modeled as genomic segments from a finite pool of heritable association structures (LD haplotypes) (Druet, 2009). REGENS can also simulate mono-allelic and epistatic single nucleotide variant (SNV) effects of any order without perturbing the simulated LD pattern. The SNVs involved in an effect can contribute additively, dominantly, recessively, only if heterozygous, or only if homozygous. All simulated effects contribute to the value of either a binary or continuous biological trait (phenotype) with a specified mean value and a specified amount of random noise.

[1]  Randal S. Olson,et al.  Grid-based stochastic search for hierarchical gene-gene interactions in population-based genetic studies of common human diseases , 2017, BioData Mining.

[2]  Deborah A Nickerson,et al.  Allele Frequency Matching Between SNPs Reveals an Excess of Linkage Disequilibrium in Genic Regions of the Human Genome , 2006, PLoS genetics.

[3]  Montgomery Slatkin,et al.  Linkage disequilibrium — understanding the evolutionary past and mapping the medical future , 2008, Nature Reviews Genetics.

[4]  Marylyn D. Ritchie,et al.  Generating Linkage Disequilibrium Patterns in Data Simulations Using genomeSIMLA , 2008, EvoBIO.

[5]  Yun S. Song,et al.  Inference and analysis of population-specific fine-scale recombination maps across 26 diverse human populations , 2019, Science Advances.

[6]  D. Postma,et al.  Missing heritability: is the gap closing? An analysis of 32 complex traits in the Lifelines Cohort Study , 2017, European Journal of Human Genetics.

[7]  Min Shi,et al.  Simulating autosomal genotypes with realistic linkage disequilibrium and a spiked-in genetic effect , 2017, BMC Bioinformatics.

[8]  Tom Druet,et al.  A Hidden Markov Model Combining Linkage and Linkage Disequilibrium Information for Haplotype Reconstruction and Quantitative Trait Locus Fine Mapping , 2010, Genetics.

[9]  E. Topol,et al.  The personal and clinical utility of polygenic risk scores , 2018, Nature Reviews Genetics.

[10]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.