Genome diversity in Ukraine

The main goal of this collaborative effort is to provide genome wide data for the previously underrepresented population in Eastern Europe, and to provide cross-validation of the data from genome sequences and genotypes of the same individuals acquired by different technologies. We collected 97 genome-grade DNA samples from consented individuals representing major regions of Ukraine that were consented for the public data release. DNBSEQ-G50 sequences, and genotypes by an Illumina GWAS chip were cross-validated on multiple samples, and additionally referenced to one sample that has been resequenced by Illumina NovaSeq6000 S4 at high coverage. The genome data has been searched for genomic variation represented in this population, and a number of variants have been reported: large structural variants, indels, CNVs, SNPs and microsatellites. This study provides the largest to-date survey of genetic variation in Ukraine, creating a public reference resource aiming to provide data for historic and medical research in a large understudied population. While most of the common variation is shared with other European populations, this survey of population variation contributes a number of novel SNPs and structural variants that have not been reported in the gnomAD/1KG databases representing global distribution of genomic variation. These endemic variants will become a valuable resource for designing future population and clinical studies, help address questions about ancestry and admixture, and will fill a missing place in the puzzle characterizing human population diversity in Eastern Europe. Our results indicate that genetic diversity of the Ukrainian population is uniquely shaped by the evolutionary and demographic forces, and cannot be ignored in the future genetic and biomedical studies. This data will contribute a wealth of new information bringing forth different risk and/or protective alleles. The newly discovered low frequency and local variants can be added to the current genotyping arrays for genome wide association studies, clinical trials, and in genome assessment of proliferating cancer cells.

[1]  Rachel M. Sherman,et al.  Pan-genomics in the human genome era , 2020, Nature Reviews Genetics.

[2]  Joel Nothman,et al.  SciPy 1.0-Fundamental Algorithms for Scientific Computing in Python , 2019, ArXiv.

[3]  Swapan Mallick,et al.  Insights into human genetic variation and population history from 929 diverse genomes , 2019, Science.

[4]  Ryan L. Collins,et al.  The mutational constraint spectrum quantified from variation in 141,456 humans , 2020, Nature.

[5]  A. Pombo,et al.  Methods for mapping 3D chromosome architecture , 2019, Nature Reviews Genetics.

[6]  A. Nugent,et al.  Reporting of race in genome and exome sequencing studies of cancer: a scoping review of the literature , 2019, Genetics in Medicine.

[7]  Y. Kamatani,et al.  Comprehensive evaluation of structural variation detection algorithms for whole genome sequencing , 2019, Genome Biology.

[8]  Z. Tian,et al.  SEQdata-BEACON: a comprehensive database of sequencing performance and statistical tools for performance evaluation and yield simulation in BGISEQ-500 , 2019, BioData Mining.

[9]  Helen E. Parkinson,et al.  The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019 , 2018, Nucleic Acids Res..

[10]  S. O’Brien,et al.  Genome-wide sequence analyses of ethnic populations across Russia. , 2020, Genomics.

[11]  T. Sicheritz-Pontén,et al.  Erratum to: Comparative performance of the BGISEQ-500 vs Illumina HiSeq2500 sequencing platforms for palaeogenomic sequencing , 2018, GigaScience.

[12]  Yassen Assenov,et al.  Maftools: efficient and comprehensive analysis of somatic variants in cancer , 2018, Genome research.

[13]  Nima Mousavi,et al.  Profiling the genome-wide landscape of tandem repeat expansions , 2018, bioRxiv.

[14]  Arcadi Navarro,et al.  Replicability and Prediction: Lessons and Challenges from GWAS. , 2018, Trends in genetics : TIG.

[15]  Arne Ludwig,et al.  Ancient genomes revisit the ancestry of domestic and Przewalski’s horses , 2018, Science.

[16]  Chunlei Liu,et al.  ClinVar: improving access to variant interpretations and supporting evidence , 2017, Nucleic Acids Res..

[17]  Ivor Karavanić,et al.  The Genomic History of Southeastern Europe , 2017, Nature.

[18]  Yun Sung Cho,et al.  KoVariome: Korean National Standard Reference Variome database of whole genomes with comprehensive SNV, indel, CNV, and SV analyses , 2018, Scientific Reports.

[19]  Ryan E. Mills,et al.  The Mobile Element Locator Tool (MELT): population-scale mobile element discovery and biology , 2017, Genome research.

[20]  K. Eilbeck,et al.  Settling the score: variant prioritization and Mendelian disease , 2017, Nature Reviews Genetics.

[21]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[22]  T. Sicheritz-Pontén,et al.  Comparative performance of the BGISEQ-500 vs Illumina HiSeq2500 sequencing platforms for palaeogenomic sequencing , 2017, GigaScience.

[23]  Yang I Li,et al.  An Expanded View of Complex Traits: From Polygenic to Omnigenic , 2017, Cell.

[24]  Paul Flicek,et al.  Alignment of 1000 Genomes Project reads to reference assembly GRCh38 , 2017, GigaScience.

[25]  R. Durbin,et al.  Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly , 2016, bioRxiv.

[26]  L. Orlando,et al.  The Evolutionary Origin and Genetic Makeup of Domestic Horses , 2016, Genetics.

[27]  Michael C. Westaway,et al.  Genomic analyses inform on migration events during the peopling of Eurasia , 2016, Nature.

[28]  Levi Waldron,et al.  Racial/Ethnic Disparities in Genomic Sequencing. , 2016, JAMA oncology.

[29]  Yun S. Song,et al.  The Simons Genome Diversity Project: 300 genomes from 142 diverse populations , 2016, Nature.

[30]  C. Shaw,et al.  Multiallelic Positions in the Human Genome: Challenges for Genetic Analyses , 2016, Human mutation.

[31]  E. Boerwinkle,et al.  dbNSFP v3.0: A One‐Stop Database of Functional Predictions and Annotations for Human Nonsynonymous and Splice‐Site SNVs , 2016, Human mutation.

[32]  Ricardo Villamarín-Salomón,et al.  ClinVar: public archive of interpretations of clinically relevant variants , 2015, Nucleic Acids Res..

[33]  S. O’Brien,et al.  Putting Russia on the genome map. , 2015, Science.

[34]  S. O’Brien,et al.  The Genome Russia project: closing the largest remaining omission on the world Genome map , 2015, GigaScience.

[35]  Gabor T. Marth,et al.  A global reference for human genetic variation , 2015, Nature.

[36]  Kai Ye,et al.  Structural Variation Detection from Next Generation Sequencing , 2015 .

[37]  Leif Andersson,et al.  Prehistoric genomes reveal the genetic foundation and cost of horse domestication , 2014, Proceedings of the National Academy of Sciences.

[38]  Michael C. Westaway,et al.  Genomic structure in Europeans dating back at least 36,200 years , 2014, Science.

[39]  Ryan E. Mills,et al.  The genomic landscape of polymorphic human nuclear mitochondrial insertions , 2014, bioRxiv.

[40]  Lars Feuk,et al.  The Database of Genomic Variants: a curated collection of structural variation in the human genome , 2013, Nucleic Acids Res..

[41]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[42]  Michael Krawczak,et al.  Where genotype is not predictive of phenotype: towards an understanding of the molecular basis of reduced penetrance in human inherited disease , 2013, Human Genetics.

[43]  Chia-Yen Chen,et al.  Improved ancestry inference using weights from external reference panels , 2013, Bioinform..

[44]  Ryan M. Layer,et al.  LUMPY: a probabilistic framework for structural variant discovery , 2012, Genome Biology.

[45]  M. Patou-Mathis,et al.  Mammoths used as food and building resources by Neanderthals: Zooarchaeological study applied to layer 4, Molodova I (Ukraine) , 2012 .

[46]  Kenny Q. Ye,et al.  An integrated map of genetic variation from 1,092 human genomes , 2012, Nature.

[47]  A. Eriksson,et al.  Reconstructing the origin and spread of horse domestication in the Eurasian steppe , 2012, Proceedings of the National Academy of Sciences.

[48]  Pablo Cingolani,et al.  © 2012 Landes Bioscience. Do not distribute. , 2022 .

[49]  Pablo Cingolani,et al.  Using Drosophila melanogaster as a Model for Genotoxic Chemical Mutational Studies with a New Program, SnpSift , 2012, Front. Gene..

[50]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[51]  W. G. Hill,et al.  The population genetics of mutations: good, bad and indifferent , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[52]  Stephen J. O'Brien,et al.  Genome-wide scans for footprints of natural selection , 2010, Philosophical Transactions of the Royal Society B: Biological Sciences.

[53]  Wes McKinney,et al.  Data Structures for Statistical Computing in Python , 2010, SciPy.

[54]  David H. Alexander,et al.  Fast model-based estimation of ancestry in unrelated individuals. , 2009, Genome research.

[55]  R. Stephens,et al.  Genome and gene alterations by insertions and deletions in the evolution of human and chimpanzee chromosome 22 , 2009, BMC Genomics.

[56]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[57]  S. O’Brien,et al.  Mapping by admixture linkage disequilibrium: advances, limitations and guidelines , 2005, Nature Reviews Genetics.

[58]  J. Owsinski,et al.  Ethnic Groups and Population Changes in Twentieth Century Eastern Europe: History, Data and Analysis: History, Data and Analysis , 2002 .

[59]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[60]  K. Sirotkin,et al.  dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. , 1999, Genome research.

[61]  J. Stephens,et al.  Mapping by admixture linkage disequilibrium in human populations: limits and guidelines. , 1994, American journal of human genetics.

[62]  Orest Subtelny Ukraine: A History , 1988 .

[63]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.