TagSNP-set selection for genotyping using integrated data

Abstract Single-nucleotide polymorphisms (SNPs) are vital in identifying genetic level variations in complex disease. It was found that the information of SNPs on adjacent or identical genes can be represented by a few tagSNPs (called tag SNP-set or tagSNP-set). In this work, we propose a novel method called TagSNP-set Selection by Optimal Iteration with Linkage Disequilibrium (TSOILD) and develop a quantificationally analytical tagSNP-set prediction method called Physical Distance-Linkage Disequilibrium Prediction Method (PDLDPM). To verify the validity of TSOILD method and PDLDPM, a large amount of test data is generated by simulation software HAPGEN2. According to the experimental results, the prediction accuracy of TSOILD is improved by 6.73%, 3.19%, 6.52% and 1.72% over the Random Sampling, Genetic Algorithm (GA) , Greedy Algorithm and TagSNP-Set Selection Method with Maximum Information (TSMI) respectively. In addition, PDLDPM, Linkage Coverage and selection of tag SNPs to maximize prediction accuracy (STAMPA) are used to evaluate the tagSNP-set selected by Random Sampling, GA, Greedy Algorithm and TSMI. Results show that the PDLDPM performs better than the other two methods. These methods provide effective assistance for the study of genetic level variation of complex diseases.

[1]  P. Donnelly,et al.  Genome-wide strategies for detecting multiple loci that influence complex diseases , 2005, Nature Genetics.

[2]  Fan Meng,et al.  A novel YOLOv3-arch model for identifying cholelithiasis and classifying gallstones on CT images , 2019, PloS one.

[3]  Xiaodong Wang,et al.  Discovering Genome-Wide Tag SNPs Based on the Mutual Information of the Variants , 2016, PloS one.

[4]  Danielle Posthuma,et al.  Using Gene-Set Analysis To Gain Biological Knowledge Based On GWAS Results , 2019, European Neuropsychopharmacology.

[5]  J. Casanova,et al.  Human genetics of infectious diseases: Unique insights into immunological redundancy. , 2017, Seminars in immunology.

[6]  Xinzhu Meng,et al.  A novel SNP-set analytical method without distinguishing common variants or rare variants in genome-wide association study , 2018, International Journal of Biomathematics.

[7]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[8]  Zhaohui S. Qin,et al.  TagSNP Selection Based on Pairwise LD Criteria and Power Analysis in Association Studies , 2005, Pacific Symposium on Biocomputing.

[9]  Shudong Wang,et al.  An efficient weighted tag SNP-set analytical method in genome-wide association studies , 2015, BMC Genetics.

[10]  X. Xie,et al.  Genotyping single-sperm cells by universal MARSALA enables the acquisition of linkage information for combined pre-implantation genetic diagnosis and genome screening , 2018, Journal of Assisted Reproduction and Genetics.

[11]  Y. Kashi,et al.  Genome-Wide SNP-Genotyping Array to Study the Evolution of the Human Pathogen Vibrio vulnificus Biotype 3 , 2014, PloS one.

[12]  M. Zucchi,et al.  In-depth genome characterization of a Brazilian common bean core collection using DArTseq high-density SNP genotyping , 2017, BMC Genomics.

[13]  Claes Wahlestedt,et al.  A coding and non-coding transcriptomic perspective on the genomics of human metabolic disease , 2018, Nucleic acids research.

[14]  Yuanyuan Zhang,et al.  Data-Driven-Based Approach to Identifying Differentially Methylated Regions Using Modified 1D Ising Model , 2018, BioMed research international.

[15]  P. Donnelly,et al.  A new multipoint method for genome-wide association studies by imputation of genotypes , 2007, Nature Genetics.

[16]  Peter Donnelly,et al.  HAPGEN2: simulation of multiple disease SNPs , 2011, Bioinform..

[17]  Alfonso Rodríguez-Patón,et al.  An artificial intelligent diagnostic system on mobile Android terminals for cholelithiasis by lightweight convolutional neural network , 2019, PloS one.

[18]  C. Carlson,et al.  Generalization and Dilution of Association Results from European GWAS in Populations of Non-European Ancestry: The PAGE Study , 2013, PLoS biology.

[19]  The International HapMap Consortium A haplotype map of the human genome , 2005 .

[20]  Jinchuan Xing,et al.  HapMap tagSNP transferability in multiple populations: general guidelines. , 2008, Genomics.

[21]  Sio Iong Ao,et al.  Combining functional and linkage disequilibrium information in the selection of tag SNPs , 2007, Bioinform..

[22]  D. Choi,et al.  Identification of a molecular marker tightly linked to bacterial wilt resistance in tomato by genome-wide SNP analysis , 2018, Theoretical and Applied Genetics.

[23]  Zhen Lin,et al.  Choosing SNPs using feature selection , 2005, 2005 IEEE Computational Systems Bioinformatics Conference (CSB'05).

[24]  Sicheng He,et al.  Tagging SNP‐set selection with maximum information based on linkage disequilibrium structure in genome‐wide association studies , 2017, Bioinform..

[25]  Momiao Xiong,et al.  Gene and Pathway-Based Analysis: Second Wave of Genome-wide Association Studies , 2008 .

[26]  C. Carlson,et al.  Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium. , 2004, American journal of human genetics.

[27]  Tanya M. Teslovich,et al.  Using GWAS Data to Identify Copy Number Variants Contributing to Common Complex Diseases , 2009, 1010.5040.

[28]  N Risch,et al.  The Future of Genetic Studies of Complex Human Diseases , 1996, Science.

[29]  Xiangding Chen,et al.  Integration of summary data from GWAS and eQTL studies identified novel causal BMD genes with functional predictions. , 2018, Bone.

[30]  P. Donnelly,et al.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip , 2009, PLoS genetics.

[31]  Eran Halperin,et al.  Tag SNP selection in genotype data for maximizing SNP prediction accuracy , 2005, ISMB.

[32]  Zongli Xu,et al.  Tag SNP selection for candidate gene association studies using HapMap and gene resequencing data , 2007, European Journal of Human Genetics.

[33]  M. Olivier A haplotype map of the human genome , 2003, Nature.

[34]  Sio Iong Ao,et al.  CLUSTAG: hierarchical clustering and graph methods for selecting tag SNPs , 2005, Bioinform..

[35]  H. Ogata,et al.  A specific genetic alteration on chromosome 6 in ulcerative colitis-associated colorectal cancers. , 2003, Cancer research.

[36]  Shanchen Pang,et al.  Fault Diagnosis for Service Composition by Spiking Neural P Systems with Colored Spikes , 2019 .

[37]  Zhaohui S. Qin,et al.  Bioinformatics Original Paper an Efficient Comprehensive Search Algorithm for Tagsnp Selection Using Linkage Disequilibrium Criteria , 2022 .

[38]  Amer E. Mouawad,et al.  Multi-marker-LD based genetic algorithm for tag SNP selection , 2012, Interdisciplinary Sciences: Computational Life Sciences.

[39]  Xiong Li,et al.  A Hierarchical Clustering Method of Selecting Kernel SNP to Unify Informative SNP and Tag SNP , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[40]  Gülay Tezel,et al.  A genetic algorithm-support vector machine method with parameter optimization for selecting the tag SNPs , 2013, J. Biomed. Informatics.

[41]  S. Lapègue,et al.  Development of SNP‐genotyping arrays in two shellfish species , 2014, Molecular ecology resources.

[42]  An-Yuan Guo,et al.  lncRNASNP: a database of SNPs in lncRNAs and their potential functions in human and mouse , 2014, Nucleic Acids Res..

[43]  Qingfeng Zhao,et al.  Ensemble Method of Feature Selection and Reverse Construction of Gene Logical Network Based on Information Entropy , 2020, Int. J. Pattern Recognit. Artif. Intell..

[44]  R. Weizman,et al.  Association between obsessive-compulsive disorder and polymorphisms of genes encoding components of the serotonergic and dopaminergic pathways , 2000, European Neuropsychopharmacology.

[45]  Alfonso Rodríguez-Patón,et al.  A Parallel Bioinspired Framework for Numerical Calculations Using Enzymatic P System With an Enzymatic Environment , 2018, IEEE Access.

[46]  J. Sutcliffe,et al.  A Bayesian framework that integrates multi-omics data and gene networks predicts risk genes from schizophrenia GWAS data , 2019, Nature Neuroscience.

[47]  J. Lieberman,et al.  Lack of association between serotonin-2A receptor gene (HTR2A) polymorphisms and tardive dyskinesia in schizophrenia , 2001, Molecular Psychiatry.

[48]  Frank Dudbridge,et al.  Haplotype tagging for the identification of common disease genes , 2001, Nature Genetics.