Genetic differences among ethnic groups

BackgroundMany differences between different ethnic groups have been observed, such as skin color, eye color, height, susceptibility to some diseases, and response to certain drugs. However, the genetic bases of such differences have been under-investigated. Since the HapMap project, large-scale genotype data from Caucasian, African and Asian population samples have been available. The project found that these populations were located in different areas of the PCA (Principal Component Analysis) plot. However, as an unsupervised method, PCA does not measure the differences in each single nucleotide polymorphism (SNP) among populations.ResultsWe applied an advanced mutual information-based feature selection method to detect associations between SNP status and ethnic groups using the latest HapMap Phase 3 release version 3, which included more sub-populations. A total of 299 SNPs were identified, and they can accurately predicted the ethnicity of all HapMap populations. The 10-fold cross validation accuracy of the SMO (sequential minimal optimization) model on training dataset was 0.901, and the accuracy on independent test dataset was 0.895.ConclusionsIn-depth functional analysis of these SNPs and their nearby genes revealed the genetic bases of skin and eye color differences among populations.

[1]  Yu-Dong Cai,et al.  An Ensemble Prognostic Model for Colorectal Cancer , 2013, PloS one.

[2]  K. Indrák,et al.  various ethnic groups Glucose-6 phosphate dehydrogenase mutations and haplotypes in , 2011 .

[3]  R. Cardiff,et al.  Mice deficient in Rbm38, a target of the p53 family, are susceptible to accelerated aging and spontaneous tumors , 2014, Proceedings of the National Academy of Sciences.

[4]  Lei Chen,et al.  Computational Analysis of HIV-1 Resistance Based on Gene Expression Profiles and the Virus-Host Interaction Network , 2011, PloS one.

[5]  Lei Chen,et al.  Analysis of Tumor Suppressor Genes Based on Gene Ontology and the KEGG Pathway , 2014, PloS one.

[6]  Yang Liu,et al.  Genome-Wide Interaction-Based Association Analysis Identified Multiple New Susceptibility Loci for Common Diseases , 2011, PLoS genetics.

[7]  Yu Shyr,et al.  The prediction of interferon treatment effects based on time series microarray gene expression profiles , 2008, Journal of Translational Medicine.

[8]  Lu Xie,et al.  SySAP: a system-level predictor of deleterious single amino acid polymorphisms , 2011, Protein & Cell.

[9]  Kuo-Chen Chou,et al.  Signal Propagation in Protein Interaction Network during Colorectal Cancer Progression , 2013, BioMed research international.

[10]  Dean T. Jamison,et al.  Measuring the Global Burden of Disease and Risk Factors, 1990–2001 , 2006 .

[11]  Sandeep Vijan,et al.  Type 2 Diabetes , 2010, Annals of Internal Medicine.

[12]  Yu-Dong Cai,et al.  Predicting A-to-I RNA Editing by Feature Selection and Random Forest , 2014, PloS one.

[13]  E. V. Van Scott Keratinization and hair growth. , 1968, Annual review of medicine.

[14]  S. Sood,et al.  Beta-thalassaemia mutations in northern India (Delhi). , 1998, The Indian journal of medical research.

[15]  N. Scott,et al.  Ethnic differences in breast cancer survival in New Zealand: contributions of differences in screening, treatment, tumor biology, demographics and comorbidities , 2015, Cancer Causes and Control.

[16]  S. Lowe,et al.  Cancer: Mutant p53 and chromatin regulation , 2015, Nature.

[17]  Yi-chang Jia,et al.  Characterization of ZNF23, a KRAB-containing protein that is downregulated in human cancers and inhibits cell cycle progression. , 2007, Experimental cell research.

[18]  Lei Chen,et al.  Computationally identifying virulence factors based on KEGG pathways. , 2013, Molecular bioSystems.

[19]  E. Fuchs Genetic skin disorders of keratin. , 1992, The Journal of investigative dermatology.

[20]  E. Culotta,et al.  p53 sweeps through cancer research. , 1993, Science.

[21]  B. Shastry SNPs in disease gene mapping, medicinal drug development and evolution , 2007, Journal of Human Genetics.

[22]  E. J. Scott Keratinization and Hair Growth , 1968 .

[23]  Deyu Meng,et al.  Fast and Efficient Strategies for Model Selection of Gaussian Support Vector Machine , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[24]  Maido Remm,et al.  Shared and unique components of human population structure and genome-wide signals of positive selection in South Asia. , 2011, American journal of human genetics.

[25]  Ian Jackson,et al.  Variants of the melanocyte–stimulating hormone receptor gene are associated with red hair and fair skin in humans , 1995, Nature Genetics.

[26]  D. Duffy,et al.  Linkage and association analysis of spectrophotometrically quantified hair color in Australian adolescents: the effect of OCA2 and HERC2. , 2008, The Journal of investigative dermatology.

[27]  F. Collins,et al.  The Human Genome Project. Revealing the shared inheritance of all humankind. , 2001 .

[28]  A. Karter Commentary: Race, genetics, and disease--in search of a middle ground. , 2003, International journal of epidemiology.

[29]  N. Craddock,et al.  Familial Cosegregation of Major Affective Disorder and Darier's Disease (Keratosis Follicularis) , 1994, British Journal of Psychiatry.

[30]  K. Indrák,et al.  Glucose-6 phosphate dehydrogenase mutations and haplotypes in various ethnic groups. , 1995, Blood.

[31]  Caitlin P. McHugh,et al.  Genome-wide association study identifies novel loci predisposing to cutaneous melanoma. , 2011, Human molecular genetics.

[32]  S. Diano,et al.  Prolyl Endopeptidase (PREP) is Associated With Male Reproductive Functions and Gamete Physiology in Mice , 2016, Journal of cellular physiology.

[33]  J. Tuomilehto,et al.  Impact of classical risk factors of type 2 diabetes among Asian Indian, Chinese and Japanese populations. , 2015, Diabetes & metabolism.

[34]  M. Coleman,et al.  Cancer survival differences between South Asians and non-South Asians of England in 1986–2004, accounting for age at diagnosis and deprivation , 2015, British Journal of Cancer.

[35]  A. Wei,et al.  Exome sequencing identifies SLC24A5 as a candidate gene for nonsyndromic oculocutaneous albinism. , 2013, The Journal of investigative dermatology.

[36]  Robert-Jan Palstra,et al.  HERC2 rs12913832 modulates human pigmentation by attenuating chromatin-loop formation between a long-range enhancer and the OCA2 promoter. , 2012, Genome research.

[37]  Y W Kan,et al.  Prenatal diagnosis of beta-thalassemia. Detection of a single nucleotide mutation in DNA. , 1983, The New England journal of medicine.

[38]  Kuo-Chen Chou,et al.  Hepatitis C Virus Network Based Classification of Hepatocellular Cirrhosis and Carcinoma , 2012, PloS one.

[39]  David W. Aha,et al.  Instance-Based Learning Algorithms , 1991, Machine Learning.

[40]  G Barbujani,et al.  An apportionment of human DNA diversity. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[41]  Mechthild Prinz,et al.  Prediction of eye and skin color in diverse populations using seven SNPs. , 2011, Forensic science international. Genetics.

[42]  Shekhar Saxena,et al.  Prevalence of intellectual disability: a meta-analysis of population-based studies. , 2011, Research in developmental disabilities.

[43]  Ning Zhang,et al.  Discriminating between deleterious and neutral non-frameshifting indels based on protein interaction networks and hybrid properties , 2014, Molecular Genetics and Genomics.

[44]  Tao Huang,et al.  A method to distinguish between lysine acetylation and lysine ubiquitination with feature selection and analysis , 2015, Journal of biomolecular structure & dynamics.

[45]  Xu Lin,et al.  Associations between Ionomic Profile and Metabolic Abnormalities in Human Population , 2012, PloS one.

[46]  L. Brooks,et al.  A DNA polymorphism discovery resource for research on human genetic variation. , 1998, Genome research.

[47]  Dan J Stein,et al.  Global, regional, and national incidence, prevalence, and years lived with disability for 301 acute and chronic diseases and injuries in 188 countries, 1990–2013: a systematic analysis for the Global Burden of Disease Study 2013 , 2015, The Lancet.

[48]  Paolo Vineis,et al.  Genome-wide association study yields variants at 20p12.2 that associate with urinary bladder cancer. , 2014, Human molecular genetics.

[49]  Tao Huang,et al.  An Information-Theoretic Machine Learning Approach to Expression QTL Analysis , 2013, PloS one.

[50]  W. McLean,et al.  The phenotypic and molecular genetic features of pachyonychia congenita. , 2011, The Journal of investigative dermatology.

[51]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[52]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[53]  Richard P. Lippmann,et al.  Proceedings of the 1997 conference on Advances in neural information processing systems 10 , 1990 .

[54]  Tao Huang,et al.  A Novel Method of Predicting Protein Disordered Regions Based on Sequence Features , 2013, BioMed research international.

[55]  N. Shen,et al.  Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis , 1999, Nature Genetics.

[56]  Ying Sun,et al.  A nonsynonymous SNP in human cytosolic sialidase in a small Asian population results in reduced enzyme activity: potential link with severe adverse reactions to oseltamivir , 2007, Cell Research.

[57]  Carlo Valenti,et al.  Prenatal Diagnosis of β - Thalassemia , 1983 .

[58]  Yu-Dong Cai,et al.  Classification of Non-Small Cell Lung Cancer Based on Copy Number Alterations , 2014, PloS one.

[59]  Sharon R Grossman,et al.  Integrating common and rare genetic variation in diverse human populations , 2010, Nature.

[60]  B. Shastry,et al.  SNP alleles in human disease and evolution , 2002, Journal of Human Genetics.

[61]  P. Visscher,et al.  Common SNPs explain a large proportion of heritability for human height , 2011 .

[62]  Alan D. Lopez,et al.  Global burden of disease and risk factors , 2006 .

[63]  P. Steinert,et al.  Epithelial barrier function: assembly and structural features of the cornified cell envelope. , 2002, BioEssays : news and reviews in molecular, cellular and developmental biology.

[64]  N. Tandon,et al.  Evaluation of DOK5 as a susceptibility gene for type 2 diabetes and obesity in North Indian population , 2010, BMC Medical Genetics.

[65]  B. Zuckerman,et al.  Maternal preconception body mass index and offspring cord blood DNA methylation: Exploration of early life origins of disease , 2014, Environmental and molecular mutagenesis.

[66]  W Branicki,et al.  Association of Polymorphic Sites in the OCA2 Gene with Eye Colour Using the Tree Scanning Method , 2008, Annals of human genetics.

[67]  D. Cox,et al.  A genomewide association study of skin pigmentation in a South Asian population. , 2007, American journal of human genetics.

[68]  Ian H. Witten,et al.  Stacking Bagged and Dagged Models , 1997, ICML.

[69]  Bing Niu,et al.  Prediction of Substrate-Enzyme-Product Interaction Based on Molecular Descriptors and Physicochemical Properties , 2013, BioMed research international.

[70]  A. Jauch,et al.  Mosaic deletion of EXOC6B: Further evidence for an important role of the exocyst complex in the pathogenesis of intellectual disability , 2014, American journal of medical genetics. Part A.

[71]  Bing Niu,et al.  Prediction of Enzyme’s Family Based on Protein-Protein Interaction Network , 2015 .

[72]  Shibani Shetty Keratinization and its disorders. , 2012, Oman medical journal.

[73]  F. Collins,et al.  The Human Genome Project , 2001, Cancer.

[74]  H. Schroeder Differentiation of Human Oral Stratified Epithelia , 1981 .

[75]  Francis S. Collins,et al.  Erratum: A DNA polymorphism discovery resource for research on human genetic variation (Genome Research (1998) 8 (1229-1231)) , 1999 .

[76]  Yusuke Nakamura,et al.  Japanese population structure, based on SNP genotypes from 7003 individuals compared to other ethnic groups: effects on population-based association studies. , 2008, American journal of human genetics.

[77]  Ian H. Witten,et al.  Data mining in bioinformatics using Weka , 2004, Bioinform..

[78]  Keith C. Cheng,et al.  SLC24A5, a Putative Cation Exchanger, Affects Pigmentation in Zebrafish and Humans , 2005, Science.

[79]  C. Fisher,et al.  The spectrum of beta-thalassaemia mutations in the Lebanon. , 1997, Human heredity.

[80]  Kuo-Chen Chou,et al.  Deciphering the effects of gene deletion on yeast longevity using network and machine learning approaches. , 2012, Biochimie.

[81]  Shekhar Saxena,et al.  Corrigendum to “Prevalence of intellectual disability: A meta-analysis of population-based studies” [Res. Dev. Disabil. 32 (2) (2011) 419–436] , 2013 .

[82]  Lei Chen,et al.  Classifying Ten Types of Major Cancers Based on Reverse Phase Protein Array Profiles , 2015, PloS one.

[83]  A. Adeyemo,et al.  Disparities in type 2 diabetes prevalence among ethnic minority groups resident in Europe: a systematic review and meta-analysis , 2016, Internal and Emergency Medicine.

[84]  Y. Koda,et al.  Population differences of two coding SNPs in pigmentation-related genes SLC24A5 and SLC45A2 , 2006, International Journal of Legal Medicine.

[85]  Mechthild Prinz,et al.  Improved eye- and skin-color prediction based on 8 SNPs , 2013, Croatian medical journal.

[86]  Jack A. Taylor,et al.  SNPinfo: integrating GWAS and candidate gene information into functional SNP selection for genetic association studies , 2009, Nucleic Acids Res..

[87]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[88]  Qiang Feng,et al.  Identification of genomic alterations in oesophageal squamous cell cancer , 2014, Nature.

[89]  Niels Morling,et al.  Human eye colour and HERC2, OCA2 and MATP. , 2010, Forensic science international. Genetics.

[90]  Claudio J. Verzilli,et al.  An Abundance of Rare Functional Variants in 202 Drug Target Genes Sequenced in 14,002 People , 2012, Science.

[91]  A. Christoffels,et al.  Genome-wide SNP identification by high-throughput sequencing and selective mapping allows sequence assembly positioning using a framework genetic linkage map , 2010, BMC Biology.

[92]  Anna Wojas-Pelc,et al.  Interactions Between HERC2, OCA2 and MC1R May Influence Human Pigmentation Phenotype , 2009, Annals of human genetics.

[93]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[94]  Lei Chen,et al.  Prediction and Analysis of Retinoblastoma Related Genes through Gene Ontology and KEGG , 2013, BioMed research international.

[95]  A. Auluck Dyskeratosis congenita. Report of a case with literature review. , 2007, Medicina oral, patologia oral y cirugia bucal.

[96]  Wenbin Zhou,et al.  RNA-binding protein RNPC1: acting as a tumor suppressor in breast cancer , 2014, BMC Cancer.

[97]  Thibaut Jombart,et al.  adegenet 1.3-1: new tools for the analysis of genome-wide SNP data , 2011, Bioinform..

[98]  N. Xu,et al.  Expression of zinc finger 23 gene in human hepatocellular carcinoma. , 2011, Anticancer research.

[99]  A. Butte,et al.  Non-Synonymous and Synonymous Coding SNPs Show Similar Likelihood and Effect Size of Human Disease Association , 2010, PloS one.

[100]  M. Crawford The Human Genome Project. , 1990, Human biology.