Molecular Reclassification of Crohn’s Disease: A Cautionary Note on Population Stratification

Complex human diseases commonly differ in their phenotypic characteristics, e.g., Crohn’s disease (CD) patients are heterogeneous with regard to disease location and disease extent. The genetic susceptibility to Crohn’s disease is widely acknowledged and has been demonstrated by identification of over 100 CD associated genetic loci. However, relating CD subphenotypes to disease susceptible loci has proven to be a difficult task. In this paper we discuss the use of cluster analysis on genetic markers to identify genetic-based subgroups while taking into account possible confounding by population stratification. We show that it is highly relevant to consider the confounding nature of population stratification in order to avoid that detected clusters are strongly related to population groups instead of disease-specific groups. Therefore, we explain the use of principal components to correct for population stratification while clustering affected individuals into genetic-based subgroups. The principal components are obtained using 30 ancestry informative markers (AIM), and the first two PCs are determined to discriminate between continental origins of the affected individuals. Genotypes on 51 CD associated single nucleotide polymorphisms (SNPs) are used to perform latent class analysis, hierarchical and Partitioning Around Medoids (PAM) cluster analysis within a sample of affected individuals with and without the use of principal components to adjust for population stratification. It is seen that without correction for population stratification clusters seem to be influenced by population stratification while with correction clusters are unrelated to continental origin of individuals.

[1]  K. Van Steen,et al.  Molecular Reclassification of Crohn's Disease by Cluster Analysis of Genetic Variants , 2010, PloS one.

[2]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[3]  J. Satsangi,et al.  The Montreal classification of inflammatory bowel disease: controversies, consensus, and implications , 2006, Gut.

[4]  Alastair Forbes,et al.  The contribution of NOD2 gene mutations to the risk and site of disease in inflammatory bowel disease. , 2002, Gastroenterology.

[5]  Jonathan L Haines,et al.  Genetics, statistics and human disease: analytical retooling for complexity. , 2004, Trends in genetics : TIG.

[6]  David C. Wilson,et al.  Host-microbe interactions have shaped the genetic architecture of inflammatory bowel disease , 2012, Nature.

[7]  S. Fisher,et al.  A nonsynonymous SNP in ATG16L1 predisposes to ileal Crohn's disease and is independent of CARD15 and IBD5. , 2007, Gastroenterology.

[8]  K. Roeder,et al.  Genomic Control for Association Studies , 1999, Biometrics.

[9]  Judy H Cho,et al.  Genome-wide association study identifies new susceptibility loci for Crohn disease and implicates autophagy in disease pathogenesis , 2007, Nature Genetics.

[10]  D. Balding,et al.  A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity , 2005, Genetica.

[11]  Nicholas G Martin,et al.  A single SNP in an evolutionary conserved region within intron 86 of the HERC2 gene determines human blue-brown eye color. , 2008, American journal of human genetics.

[12]  P. Donnelly,et al.  Inference of population structure using multilocus genotype data. , 2000, Genetics.

[13]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[14]  Judy H. Cho,et al.  Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease , 2008, Nature Genetics.

[15]  P. Donnelly,et al.  The effects of human population structure on large genetic association studies , 2004, Nature Genetics.

[16]  Yang Zhao,et al.  Correction for population stratification in random forest analysis. , 2012, International journal of epidemiology.

[17]  Naoyuki Kamatani,et al.  Cluster analysis and association study of structured multilocus genotype data , 2005, Journal of Human Genetics.

[18]  T. Ahmad,et al.  The molecular classification of the clinical manifestations of Crohn's disease. , 2002, Gastroenterology.

[19]  D. Steinley Properties of the Hubert-Arabie adjusted Rand index. , 2004, Psychological methods.

[20]  C. Wijmenga,et al.  Confirmation of Multiple Crohn's Disease Susceptibility Loci in a Large Dutch–Belgian Cohort , 2009, The American Journal of Gastroenterology.

[21]  J. Lennard-jones,et al.  Classification of inflammatory bowel disease. , 1989, Scandinavian journal of gastroenterology. Supplement.

[22]  D. Reich,et al.  Population Structure and Eigenanalysis , 2006, PLoS genetics.

[23]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[24]  Z. Zhao,et al.  ATG16L1 T300A Shows Strong Associations With Disease Subgroups in a Large Australian IBD Population: Further Support for Significant Disease Heterogeneity , 2008, The American Journal of Gastroenterology.

[25]  M. Stoneking,et al.  Worldwide population differentiation at disease-associated SNPs , 2008, BMC Medical Genomics.

[26]  William J. Astle,et al.  Population Structure and Cryptic Relatedness in Genetic Association Studies , 2009, 1010.4681.

[27]  Crohn’s disease: ethnic variation in CARD15 genotypes , 2003, Gut.

[28]  M. Daly,et al.  CARD15 genetic variation in a Quebec population: prevalence, genotype-phenotype relationship, and haplotype structure. , 2002, American journal of human genetics.

[29]  John P. Rice,et al.  Clustering methods applied to allele sharing data , 2000, Genetic epidemiology.

[30]  Jason H. Moore,et al.  Dissecting trait heterogeneity: a comparison of three clustering methods applied to genotypic data , 2006, BMC Bioinformatics.

[31]  Yusuke Nakamura,et al.  Absence of mutation in the NOD2/CARD15 gene among 483 Japanese patients with Crohn's disease , 2002, Journal of Human Genetics.

[32]  Michael K. Ng,et al.  SKM-SNP: SNP markers detection method , 2010, J. Biomed. Informatics.

[33]  G A Satten,et al.  Accounting for unmeasured population substructure in case-control studies of genetic association using a novel latent-class model. , 2001, American journal of human genetics.

[34]  S. Vermeire Towards a Novel Molecular Classification of IBD , 2012, Digestive Diseases.

[35]  J. Hugot,et al.  Genotype/Phenotype Analyses for 53 Crohn’s Disease Associated Genetic Polymorphisms , 2012, PloS one.