The South Asian Genome

The genetic sequence variation of people from the Indian subcontinent who comprise one-quarter of the world's population, is not well described. We carried out whole genome sequencing of 168 South Asians, along with whole-exome sequencing of 147 South Asians to provide deeper characterisation of coding regions. We identify 12,962,155 autosomal sequence variants, including 2,946,861 new SNPs and 312,738 novel indels. This catalogue of SNPs and indels amongst South Asians provides the first comprehensive map of genetic variation in this major human population, and reveals evidence for selective pressures on genes involved in skin biology, metabolism, infection and immunity. Our results will accelerate the search for the genetic variants underlying susceptibility to disorders such as type-2 diabetes and cardiovascular disease which are highly prevalent amongst South Asians.

[1]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[2]  C. M. Ahmed,et al.  Controlling Nuclear Jaks and Stats for Specific Gene Activation by Ifn γ and Other Cytokines: A Possible Steroid-like Connection. , 2011, Journal of clinical & cellular immunology.

[3]  Nancy F. Hansen,et al.  Accurate Whole Human Genome Sequencing using Reversible Terminator Chemistry , 2008, Nature.

[4]  M. Marmot,et al.  Association of Early‐Onset Coronary Heart Disease in South Asian Men With Glucose Intolerance and Hyperinsulinemia , 1993, Circulation.

[5]  K. Holsinger,et al.  Genetics in geographically structured populations: defining, estimating and interpreting FST , 2009, Nature Reviews Genetics.

[6]  Johnathan Canton,et al.  Controlling nuclear JAKs and STATs for specific gene activation by IFNγ. , 2011, Biochemical and biophysical research communications.

[7]  Jeffrey E. Lee,et al.  Genome-wide association study identifies a new melanoma susceptibility locus at 1q21.3 , 2011, Nature Genetics.

[8]  T. Chang,et al.  Impact of molecular diagnosis on treating Mendelian susceptibility to mycobacterial diseases. , 2012, Journal of microbiology, immunology, and infection = Wei mian yu gan ran za zhi.

[9]  J. Kooner,et al.  C-Reactive Protein, Insulin Resistance, Central Obesity, and Coronary Heart Disease Risk in Indian Asians From the United Kingdom Compared With European Whites , 2001, Circulation.

[10]  V. Trischitta,et al.  Insulin signaling regulating genes: effect on T2DM and cardiovascular risk , 2009, Nature Reviews Endocrinology.

[11]  Ofer Isakov,et al.  Analysis of insertion-deletion from deep-sequencing data: software evaluation for optimal detection , 2013, Briefings Bioinform..

[12]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[13]  M. Marmot,et al.  Diabetes, hyperinsulinaemia, and coronary risk factors in Bangladeshis in east London. , 1988, British heart journal.

[14]  K. Ray,et al.  Molecular basis of albinism in India: evaluation of seven potential candidate genes and some new findings. , 2012, Gene.

[15]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[16]  T. Hansen,et al.  IGF2 mRNA-binding protein 2: biological function and putative role in type 2 diabetes. , 2009, Journal of molecular endocrinology.

[17]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[18]  C. Arce,et al.  Biological pathway analysis by ArrayUnlock and Ingenuity Pathway Analysis , 2009, BMC proceedings.

[19]  Vikram Patel,et al.  Chronic diseases and injuries in India , 2011, The Lancet.

[20]  Philippe Froguel,et al.  Common genetic variation near MC4R is associated with waist circumference and insulin resistance , 2008, Nature Genetics.

[21]  Paul Stothard,et al.  In-depth annotation of SNPs arising from resequencing projects using NGS-SNP , 2011, Bioinform..

[22]  Alkes L. Price,et al.  Reconstructing Indian Population History , 2009, Nature.

[23]  T. Kupiec,et al.  Association of the SLC45A2 gene with physiological human hair colour variation , 2008, Journal of Human Genetics.

[24]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[25]  N. Sattar,et al.  Do known risk factors explain the higher coronary heart disease mortality in South Asian compared with European men? Prospective follow-up of the Southall and Brent studies, UK , 2006, Diabetologia.

[26]  Jo Lambert,et al.  Genome-wide association analyses identify 13 new susceptibility loci for generalized vitiligo , 2012, Nature Genetics.

[27]  David Reich,et al.  A common MYBPC3 (cardiac myosin binding protein C) variant associated with cardiomyopathies in South Asia , 2009, Nature Genetics.

[28]  Tien Yin Wong,et al.  Genome-wide association study in individuals of South Asian ancestry identifies six new type 2 diabetes susceptibility loci , 2011, Nature Genetics.

[29]  P. Donnelly,et al.  A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies , 2009, PLoS genetics.

[30]  T. Forshew,et al.  Identification of novel TYR and TYRP1 mutations in oculocutaneous albinism , 2005, Clinical genetics.

[31]  E. Ingley Functions of the Lyn tyrosine kinase in health and disease , 2012, Cell Communication and Signaling.

[32]  R. Islam,et al.  Insulin and IGF1 receptors in human cardiac microvascular endothelial cells: metabolic, mitogenic and anti-inflammatory effects. , 2012, The Journal of endocrinology.

[33]  D. Reich,et al.  Genetic structure of a unique admixed population: implications for medical research. , 2010, Human molecular genetics.

[34]  L. Dandona,et al.  Continuing challenge of infectious diseases in India , 2011, The Lancet.

[35]  Vishal Sharma,et al.  Diabetes in Asia , 2010, The Lancet.

[36]  Jeffrey E. Lee,et al.  Genome-wide association study identifies three new melanoma susceptibility loci , 2011, Nature Genetics.

[37]  O. Ohara,et al.  Clinical and Host Genetic Characteristics of Mendelian Susceptibility to Mycobacterial Diseases in Japan , 2011, Journal of Clinical Immunology.

[38]  C. Hales,et al.  Type 2 (non-insulin-dependent) diabetes mellitus: the thrifty phenotype hypothesis , 1992, Diabetologia.

[39]  M. Guardiola,et al.  APOH is increased in the plasma and liver of type 2 diabetic patients with metabolic syndrome. , 2010, Atherosclerosis.

[40]  Eleftheria Zeggini,et al.  Rare variant association analysis methods for complex traits. , 2010, Annual review of genetics.

[41]  W. Miller,et al.  Sequencing and analysis of a South Asian-Indian personal genome , 2012, BMC Genomics.

[42]  B. Browning,et al.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. , 2007, American journal of human genetics.

[43]  M. Courtenay,et al.  Skin Cancer , 1980, Nursing management.

[44]  R. Durbin,et al.  Dindel: accurate indel calls from short-read data. , 2011, Genome research.

[45]  V. Hearing,et al.  The Protective Role of Melanin Against UV Damage in Human Skin † , 2008, Photochemistry and photobiology.

[46]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[47]  Andrew C. Adey,et al.  Haplotype-resolved genome sequencing of a Gujarati Indian individual , 2011, Nature Biotechnology.