Systematic analysis and functional annotation of variations in the genome of an Indian individual

Whole genome sequencing of personal genomes has revealed a large repertoire of genomic variations and has provided a rich template for identification of common and rare variants in genomes in addition to understanding the genetic basis of diseases. The widespread application of personal genome sequencing in clinical settings for predictive and preventive medicine has been limited due to the lack of comprehensive computational analysis pipelines. We have used next‐generation sequencing technology to sequence the whole genome of a self‐declared healthy male of Indian origin. We have generated around 28X of the reference human genome with over 99% coverage. Analysis revealed over 3 million single nucleotide variations and about 490,000 small insertion–deletion events including several novel variants. Using this dataset as a template, we designed a comprehensive computational analysis pipeline for the systematic analysis and annotation of functionally relevant variants in the genome. This study follows a systematic and intuitive data analysis workflow to annotate genome variations and its potential functional effects. Moreover, we integrate predictive analysis of pharmacogenomic traits with emphasis on drugs for which pharmacogenomic testing has been recommended. This study thus provides the template for genome‐scale analysis of personal genomes for personalized medicine. Hum Mutat 33:1133–1140, 2012. © 2012 Wiley Periodicals, Inc.

[1]  S. Henikoff,et al.  Predicting deleterious amino acid substitutions. , 2001, Genome research.

[2]  D. Goldstein,et al.  Uncovering the roles of rare variants in common disease through whole-genome sequencing , 2010, Nature Reviews Genetics.

[3]  Russ B Altman,et al.  PharmGKB: a logical home for knowledge relating genotype to drug response phenotype , 2007, Nature Genetics.

[4]  S. Schuster Next-generation sequencing transforms today's biology , 2008, Nature Methods.

[5]  M. Metzker Sequencing technologies — the next generation , 2010, Nature Reviews Genetics.

[6]  J. Mattick,et al.  Noncoding RNAs and RNA editing in brain development, functional diversification, and neurological disease. , 2007, Physiological reviews.

[7]  Hanlee P. Ji,et al.  Next-generation DNA sequencing , 2008, Nature Biotechnology.

[8]  The Indian Genome Variation Consortium The Indian Genome Variation database (IGVdb): a project overview , 2005 .

[9]  †The International HapMap Consortium The International HapMap Project , 2003, Nature.

[10]  Paulo P. Amaral,et al.  The Eukaryotic Genome as an RNA Machine , 2008, Science.

[11]  P. Bork,et al.  A method and server for predicting damaging missense mutations , 2010, Nature Methods.

[12]  E. Mardis The impact of next-generation sequencing technology on genetics. , 2008, Trends in genetics : TIG.

[13]  John Quackenbush,et al.  What would you do if you could sequence everything? , 2008, Nature Biotechnology.

[14]  I. M. G. Consortium,et al.  Genetic landscape of the people of India: a canvas for disease gene exploration , 2008 .

[15]  M. Marra,et al.  Massively parallel sequencing: the next big thing in genetic medicine. , 2009, American journal of human genetics.

[16]  M. Hammer,et al.  Genetic Evidence on the Origins of Indian Caste Populations Material Supplemental , 2022 .

[17]  P. Majumder,et al.  Genomic structures and population histories of linguistically distinct tribal groups of India , 2001, Human Genetics.

[18]  Andrew C. Adey,et al.  Haplotype-resolved genome sequencing of a Gujarati Indian individual , 2011, Nature Biotechnology.

[19]  F. Collins,et al.  Potential etiologic and functional implications of genome-wide association loci for human diseases and traits , 2009, Proceedings of the National Academy of Sciences.

[20]  P. Majumder,et al.  Ethnic India: a genomic view, with special reference to peopling and structure. , 2003, Genome research.

[21]  R. Wilson,et al.  BreakDancer: An algorithm for high resolution mapping of genomic structural variation , 2009, Nature Methods.

[22]  S. Gabriel,et al.  Advances in understanding cancer genomes through second-generation sequencing , 2010, Nature Reviews Genetics.

[23]  Thomas D. Wu,et al.  A highly annotated whole-genome sequence of a Korean individual , 2009, Nature.

[24]  K. Voelkerding,et al.  Next-generation sequencing: from basic research to diagnostics. , 2009, Clinical chemistry.

[25]  Ana Kozomara,et al.  miRBase: integrating microRNA annotation and deep-sequencing data , 2010, Nucleic Acids Res..

[26]  Ken Chen,et al.  Recurring mutations found by sequencing an acute myeloid leukemia genome. , 2009, The New England journal of medicine.

[27]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[28]  Amy E. Hawkins,et al.  DNA sequencing of a cytogenetically normal acute myeloid leukemia genome , 2008, Nature.

[29]  Sam Griffiths-Jones,et al.  The microRNA Registry , 2004, Nucleic Acids Res..

[30]  Y. Teo,et al.  Singapore Genome Variation Project: a haplotype map of three Southeast Asian populations. , 2009, Genome research.

[31]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[32]  Sangsoo Kim,et al.  The first Korean genome sequence and analysis: full genome sequencing for a socio-ethnic group. , 2009, Genome research.

[33]  Arijit Mukhopadhyay,et al.  miRvar: A comprehensive database for genomic variations in microRNAs , 2011, Human mutation.

[34]  Masao Nagasaki,et al.  Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing , 2010, Nature Genetics.

[35]  Mapping Human Genetic Diversity in Asia , 2013 .

[36]  Alkes L. Price,et al.  Reconstructing Indian Population History , 2009, Nature.

[37]  Erika Check Hayden,et al.  International genome project launched , 2008, Nature.

[38]  David S. Wishart,et al.  DrugBank: a comprehensive resource for in silico drug discovery and exploration , 2005, Nucleic Acids Res..

[39]  Dawei Li,et al.  The diploid genome sequence of an Asian individual , 2008, Nature.

[40]  J. Mattick,et al.  Non‐coding RNAs: regulators of disease , 2010, The Journal of pathology.