Deep whole-genome sequencing of 100 southeast Asian Malays.

Whole-genome sequencing across multiple samples in a population provides an unprecedented opportunity for comprehensively characterizing the polymorphic variants in the population. Although the 1000 Genomes Project (1KGP) has offered brief insights into the value of population-level sequencing, the low coverage has compromised the ability to confidently detect rare and low-frequency variants. In addition, the composition of populations in the 1KGP is not complete, despite the fact that the study design has been extended to more than 2,500 samples from more than 20 population groups. The Malays are one of the Austronesian groups predominantly present in Southeast Asia and Oceania, and the Singapore Sequencing Malay Project (SSMP) aims to perform deep whole-genome sequencing of 100 healthy Malays. By sequencing at a minimum of 30× coverage, we have illustrated the higher sensitivity at detecting low-frequency and rare variants and the ability to investigate the presence of hotspots of functional mutations. Compared to the low-pass sequencing in the 1KGP, the deeper coverage allows more functional variants to be identified for each person. A comparison of the fidelity of genotype imputation of Malays indicated that a population-specific reference panel, such as the SSMP, outperforms a cosmopolitan panel with larger number of individuals for common SNPs. For lower-frequency (<5%) markers, a larger number of individuals might have to be whole-genome sequenced so that the accuracy currently afforded by the 1KGP can be achieved. The SSMP data are expected to be the benchmark for evaluating the value of deep population-level sequencing versus low-pass sequencing, especially in populations that are poorly represented in population-genetics studies.

[1]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[2]  H. Stefánsson,et al.  Identification of low-frequency variants associated with gout and serum uric acid levels , 2011, Nature Genetics.

[3]  A. Gylfason,et al.  Mutations in BRIP1 confer high risk of ovarian cancer , 2011, Nature Genetics.

[4]  Y. J. Kim,et al.  Meta-analysis identifies multiple loci associated with kidney function–related traits in east Asian populations , 2012, Nature Genetics.

[5]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[6]  A. Uitterlinden,et al.  Genome-wide association studies in Asians confirm the involvement of ATOH7 and TGFBR3, and further identify CARD10 as a novel locus influencing optic disc area. , 2011, Human molecular genetics.

[7]  Elizabeth T. Cirulli,et al.  The Characterization of Twenty Sequenced Human Genomes , 2010, PLoS genetics.

[8]  Lester L. Peters,et al.  Genome-wide association study identifies novel breast cancer susceptibility loci , 2007, Nature.

[9]  Shuhua Xu,et al.  Population Genetic Structure of Peninsular Malaysia Malay Sub-Ethnic Groups , 2011, PloS one.

[10]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[11]  Y. Teo,et al.  Genome-wide association study identifies susceptibility loci for Dengue shock syndrome at MICB and PLCE1 , 2011, Nature Genetics.

[12]  P. Stenson,et al.  Human Gene Mutation Database: towards a comprehensive central mutation database , 2007, Journal of Medical Genetics.

[13]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[14]  Thomas D. Wu,et al.  A highly annotated whole-genome sequence of a Korean individual , 2009, Nature.

[15]  P. Green 2x genomes--does depth matter? , 2007, Genome research.

[16]  Peter Donnelly,et al.  Genome-wide and fine-resolution association analysis of malaria in West Africa , 2009, Nature Genetics.

[17]  Xiaoping Zhou,et al.  Genetic Variants on Chromosome 1q41 Influence Ocular Axial Length and High Myopia , 2012, PLoS genetics.

[18]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[19]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[20]  August E. Woerner,et al.  A novel DNA sequence database for analyzing human demographic history. , 2008, Genome research.

[21]  Tom H. Pringle,et al.  Complete Khoisan and Bantu genomes from southern Africa , 2010, Nature.

[22]  Jeong-Sun Seo,et al.  The first Irish genome and ways of improving sequence accuracy , 2010, Genome Biology.

[23]  Geoffrey B. Nilsen,et al.  Whole-Genome Patterns of Common DNA Variation in Three Human Populations , 2005, Science.

[24]  Z. Zafarina,et al.  HLA polymorphism in six Malay subethnic groups in Malaysia. , 2009, Human immunology.

[25]  Dan M Roden,et al.  A rare variant in MYH6 is associated with high risk of sick sinus syndrome , 2011, Nature Genetics.

[26]  Emily H Turner,et al.  Targeted Capture and Massively Parallel Sequencing of Twelve Human Exomes , 2009, Nature.

[27]  N. Friel,et al.  Sequencing and analysis of an Irish human genome , 2010 .

[28]  Y. Teo,et al.  Singapore Genome Variation Project: a haplotype map of three Southeast Asian populations. , 2009, Genome research.

[29]  Wei Lu,et al.  Meta-analysis of genome-wide association studies identifies eight new loci for type 2 diabetes in east Asians , 2011, Nature Genetics.

[30]  T. Hudson,et al.  A genome-wide association study identifies novel risk loci for type 2 diabetes , 2007, Nature.

[31]  Tien Yin Wong,et al.  Meta-analysis of genome-wide association studies identifies common variants associated with blood pressure variation in east Asians , 2011, Nature Genetics.

[32]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[33]  Zhaohui S. Qin,et al.  A second generation human haplotype map of over 3.1 million SNPs , 2007, Nature.

[34]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[35]  Y. Teo,et al.  SgD‐CNV, a database for common and rare copy number variants in three Asian populations , 2011, Human mutation.

[36]  Y. Teo,et al.  Association of variants in FRAP1 and PDGFRA with corneal curvature in Asian populations from Singapore. , 2011, Human molecular genetics.

[37]  M. Rieder,et al.  Estimating coverage and power for genetic association studies using near-complete variation data , 2008, Nature Genetics.

[38]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[39]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[40]  Jost B Jonas,et al.  Genome-wide association analyses identify three new susceptibility loci for primary angle closure glaucoma , 2012, Nature Genetics.

[41]  Faraz Hach,et al.  Next-generation VariationHunter: combinatorial algorithms for transposon insertion discovery , 2010, Bioinform..

[42]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[43]  Joseph K. Pickrell,et al.  A Systematic Survey of Loss-of-Function Variants in Human Protein-Coding Genes , 2012, Science.

[44]  Timothy B. Stockwell,et al.  Genetic Variation in an Individual Human Exome , 2008, PLoS genetics.

[45]  Jin Ok Yang,et al.  Mapping Human Genetic Diversity in Asia , 2009, Science.