The diploid genome sequence of an Asian individual

Here we present the first diploid genome sequence of an Asian individual. The genome was sequenced to 36-fold average coverage using massively parallel sequencing technology. We aligned the short reads onto the NCBI human reference genome to 99.97% coverage, and guided by the reference genome, we used uniquely mapped reads to assemble a high-quality consensus sequence for 92% of the Asian individual’s genome. We identified approximately 3 million single-nucleotide polymorphisms (SNPs) inside this region, of which 13.6% were not in the dbSNP database. Genotyping analysis showed that SNP identification had high accuracy and consistency, indicating the high sequence quality of this assembly. We also carried out heterozygote phasing and haplotype prediction against HapMap CHB and JPT haplotypes (Chinese and Japanese, respectively), sequence comparison with the two available individual genomes (J. D. Watson and J. C. Venter), and structural variation identification. These variations were considered for their potential biological impact. Our sequence data and analyses demonstrate the potential usefulness of next-generation sequencing technologies for personal genomics.

[1]  S. Wright,et al.  Evolution in Mendelian Populations. , 1931, Genetics.

[2]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[3]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[4]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[5]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[6]  L. Feuk,et al.  Detection of large-scale variation in the human genome , 2004, Nature Genetics.

[7]  G. Church,et al.  The Personal Genome Project , 2005, Molecular systems biology.

[8]  N. Risch,et al.  Estimation of individual admixture: Analytical and study design considerations , 2005, Genetic epidemiology.

[9]  Feng Chen,et al.  Sequencing and Analysis of Neanderthal Genomic DNA , 2006, Science.

[10]  V. McKusick Mendelian Inheritance in Man and Its Online Version, OMIM , 2007, The American Journal of Human Genetics.

[11]  Timothy B. Stockwell,et al.  The Diploid Genome Sequence of an Individual Human , 2007, PLoS biology.

[12]  Philip M. Kim,et al.  Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome , 2007, Science.

[13]  Peter M Visscher,et al.  Recent human effective population size estimated from linkage disequilibrium. , 2007, Genome research.

[14]  K. Lunetta,et al.  The neuronal sortilin-related receptor SORL1 is genetically associated with Alzheimer disease , 2007, Nature Genetics.

[15]  Rebecca F. Halperin,et al.  A high-density whole-genome association study reveals that APOE is the major susceptibility gene for sporadic late-onset Alzheimer's disease. , 2007, The Journal of clinical psychiatry.

[16]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[17]  J. Lupski,et al.  The complete genome of an individual by massively parallel DNA sequencing , 2008, Nature.

[18]  Ruiqiang Li,et al.  SOAP: short oligonucleotide alignment program , 2008, Bioinform..

[19]  Joshua M. Korn,et al.  Mapping and sequencing of structural variation from eight human genomes , 2008, Nature.

[20]  E. Eichler,et al.  Closing gaps in the human genome with fosmid resources generated from multiple individuals , 2008, Nature Genetics.