Human genome epidemiology, progress and future

Human genome epidemiology (HuGE) uses systematic applications of epidemiologic methods to assess the impact of human genetic variation on health and disease. In the past ten years, human genome epidemiology has made great progresses along with advances in genomics technologies, which make it possible for the examination of genetic variants in a large sample size at a sufficiently low cost. Genetic association study in population provides a powerful approach to identify variants or genes associated with disease of interest by comparing distributions of genetic variants between affected and unaffected individuals. A critical question remains that these findings from genetic association studies are not consistently reproducible, possibly due to false positive, false negative or true variability in association among different populations. Before 2006, only few low penetrance variants outside the HLA locus had been discovered to be reproducibly associated with disease susceptibility based on the candidate gene approach[1]. With the application of high throughput genotyping technology, genome-wide association studies (GWAS) have emerged as a powerful tool for investigating the genetic architecture of complex diseases without a prior hypothesis about a particular gene or locus. In the GWAS approach, several hundred thousand to more than a million single nucleotide polymorphisms (SNPs) are assayed across the whole genome in a large sample size of thousands of individuals[2]. To date, GWAS have led to the discovery of thousands of loci that are associated with different kinds of human diseases or traits. These findings have advanced our current knowledge of genetics of complex diseases, and provided new insights into identification of therapeutic targets and developing targeted interventions[3]. Given the advance in association study of complex diseases, especially for cancer, in the current issue, two reviews have been invited to comment on the progress of HuGE on head and neck cancer[4] and bladder cancer[5]. The SNP chips used for GWAS are designed to provide better coverage of common SNPs, which, however, may be incompetent to interrogate all common variants for certain regions and have very limited potential to capture rare and low frequency variants[6]. Such variants may hardly be focused on using available genome-wide genotyping technologies. To date, only a small fraction of disease hereditability could be interpreted by known susceptibility loci. For example, more than 40 prostate cancer susceptibility loci have been reported; however, these only account for approximately 25% of the familial risk of the disease[7]. In this regard, genetic association studies should direct more efforts at the variants around genes/regions already implicated in disease pathogenesis. Functional variants in such genes/regions represent particularly impressive candidates. Several reports in this issue demonstrate the necessity and importance of association studies based on functionally candidate genes for variants associated with disease susceptibility or prognosis[8]–[10]. Of course, these findings, in the absence of replication, need to be validated in other studies with independent samples. Nevertheless, we suggest that more efforts should be taken on regions identified as relevant to diseases through GWAS using a targeted region approach, which has been shown as an efficient and cost-effective screening for rare and low-frequency polymorphisms to expand the disease variance explained[11]. In addition to the issues discussed above, rare variants, epistasis, epigenetics and genotype-environment interactions may also contribute to the missing heritability[12]. As compared with the identified common variants by GWAS (in general, minor allele frequency is more than 5%), rarer variants that are poorly detected by available genotyping arrays. Structural variation, including copy number variants (CNVs), inversions, translocations, microsatellite, repeat expansions, insertions of new sequence, complex rearrangements, and short insertions or deletions (indels), may also account for some of the unexplained heritability and are poorly captured by existing arrays[13]. In the current issue, Wang et al[14] explored the landscape and impact of a new form of genomic variation, run of homozygosity (ROH), on lung cancer. ROH is a continuous or uninterrupted stretch of a genomic sequence without heterozygosity in the diploid state, which is poorly investigated to date. Using an existing GWAS dataset including 1473 lung cancer cases and 1962 controls, they identified a new region at 14q23.1 that was consistently associated with lung cancer risk in Chinese population[14], suggesting that ROHs may be also responsible for the unexplained familial risk of diseases. Nevertheless, the epistasis or gene-environment interaction may contribute to a large percentage of disease variance. However, it is still difficult to detect them in the current status of epidemiological study design, exposure assessment, and methods of analysis. Long term environmental exposure assessment will be more important in the future epidemiological study design. Beyond doubt, genetic association studies, especially GWAS, have made a great progress in elucidating genetic factors underling complex diseases. However, there are still several barriers to overcome. GWAS-identified SNPs may point to functional variants but are unlikely themselves to be the causative variants, since there will often be several variants in strong linkage disequilibrium (LD) that show more or less equivalent evidence of association for any given signal of association. Extensive sequencing of the identified region followed by a well-designed fine-mapping study in multiple populations may be helpful to narrow an association signal to potentially causative variants[15]. Furthermore, much additional work is needed to determine the functional basis for the observed associations. Particularly, the potential for variants identified in GWAS to predict the risk of complex diseases has been anticipated, but the usefulness of translating these fundamental genetic findings into the bedside remains debatable[16]. Nevertheless, there are already a number of benefits of such genetic prediction over classical non-genetic models. For instance, genetic risk prediction is more stable over time than traditional risk factors, as a person's genetic sequence is absolutely constant throughout their life. Recently, Sun et al.[17] reported that genetic score calculated by genetic variants discovered through association study is an objective and better measurement of inherited risk of prostate cancer than family history, which can be obtained without a laboratory test but influenced by family size, age and survival status of male relatives, recall ability, family communication, and prevalence of the disease in populations. With increasing numbers of discovered genetic variants that can be used as biomarkers in future genetic risk prediction, we believe that identification of a proportion of a high risk population may be feasible for target diagnostic, and preventive and therapeutic interventions for complex disorders.

[1]  Hongbing Shen,et al.  Genome-wide analysis of runs of homozygosity identifies new susceptibility regions of lung cancer in Han Chinese , 2013, Journal of biomedical research.

[2]  Hongbing Shen,et al.  Prognostic assessment of apoptotic gene polymorphisms in non-small cell lung cancer in Chinese , 2013, Journal of biomedical research.

[3]  Hongbing Shen,et al.  Genetic variants in pseudogene E2F3P1 confer risk for HBV-related hepatocellular carcinoma in a Chinese population , 2013, Journal of biomedical research.

[4]  Q. Wei,et al.  Molecular epidemiology of DNA repair gene polymorphisms and head and neck cancer , 2013, Journal of biomedical research.

[5]  Jiachun Lu,et al.  Effect of EME1 exon variant Ile350Thr on risk and early onset of breast cancer in southern Chinese women , 2013, Journal of biomedical research.

[6]  Meilin Wang,et al.  Bladder cancer epidemiology and genetic susceptibility , 2013, Journal of biomedical research.

[7]  Jianfeng Xu,et al.  Genetic score is an objective and better measurement of inherited risk of prostate cancer than family history. , 2013, European urology.

[8]  A. Ziegler,et al.  The Promise and Limitations of Genome-wide Association Studies , 2012 .

[9]  Joshua M. Korn,et al.  Deep resequencing of GWAS loci identifies independent rare variants associated with inflammatory bowel disease , 2011, Nature Genetics.

[10]  Suzanne Chambers,et al.  Seven prostate cancer susceptibility loci identified by a multi-stage genome-wide association study , 2011, Nature Genetics.

[11]  C. Carlson,et al.  Principles for the post-GWAS functional characterization of cancer risk loci , 2011, Nature Genetics.

[12]  Teri A Manolio,et al.  Genomewide association studies and assessment of the risk of disease. , 2010, The New England journal of medicine.

[13]  G. Gibson Hints of hidden heritability in GWAS , 2010, Nature Genetics.

[14]  A. Singleton,et al.  Genomewide association studies and human disease. , 2009, The New England journal of medicine.

[15]  Judy H. Cho,et al.  Finding the missing heritability of complex diseases , 2009, Nature.

[16]  M. McCarthy,et al.  An evaluation of HapMap sample size and tagging SNP performance in large-scale empirical and simulated data sets , 2005, Nature Genetics.

[17]  E. Lander,et al.  Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease , 2003, Nature Genetics.