A brief survey on GWAS and ML algorithms

Nowadays, we can see an increasing number of studies in genomics that try to find out ways to detect diseases and also better prevention methods. The public would gain a lot of benefits from the studies. With the rapid development of genotyping technology, it creates opportunity to the researchers to go depth to the genetic and look into the variants. Most of the time, researchers would found different set of variants that increase the risk to the different diseases. Moreover, it is found that different populations would have same or would have different set of variants. The association of the variants to the disease is still in mystery but could be discovered by thorough studies. The studies about the variants are also known as genome wide association studies (GWAS). Key roles in GWAS are not limited to the bioinformaticians or pure scientists only, but also computer scientists could contribute to the studies by developing algorithms and tools. Therefore, this paper would like to briefly introduce GWAS and facilitate researchers with several studies that have applied machine learning (ML) algorithms in GWAS.

[1]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[2]  R. Schiffer,et al.  INTRODUCTION , 1988, Neurology.

[3]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[4]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[5]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[6]  M. Gerstein,et al.  What is bioinformatics ? An introduction and overview , 2001 .

[7]  Douglas B Kell,et al.  Genotype-phenotype mapping: genes as computer programs. , 2002, Trends in genetics : TIG.

[8]  SNP and Mutation Data on the Web – Hidden Treasures for Uncovering , 2002, Comparative and functional genomics.

[9]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[10]  Hongyu Zhao,et al.  Sample size needed to detect gene-gene interactions using association designs. , 2003, American journal of epidemiology.

[11]  Lior Rokach,et al.  Data Mining And Knowledge Discovery Handbook , 2005 .

[12]  Bruce R. Korf,et al.  Human Genetics and Genomics , 2006 .

[13]  K. Sirotkin,et al.  The NCBI dbGaP database of genotypes and phenotypes , 2007, Nature Genetics.

[14]  Perry L. Miller,et al.  Journal of Biomedical Informatics 40 (2007) 750–760 , 2006 .

[15]  Reda Alhajj,et al.  CARSVM: A class association rule-based classification framework and its application to gene expression data , 2008, Artif. Intell. Medicine.

[16]  Ioannis P. Vlahavas,et al.  Biological Data Mining , 2007 .

[17]  Andrew D. Johnson,et al.  Bmc Medical Genetics an Open Access Database of Genome-wide Association Results , 2009 .

[18]  L. Steinmetz,et al.  Identification of mitochondrial disease genes through integrative analysis of multiple datasets. , 2008, Methods.

[19]  Ben Goertzel,et al.  Classifier ensemble based analysis of a genome-wide SNP dataset concerning Late-Onset Alzheimer Disease , 2009, 2009 8th IEEE International Conference on Cognitive Informatics.

[20]  Vipin Kumar,et al.  Association Analysis Techniques for Bioinformatics Problems , 2009, BICoB.

[21]  Athanasios V. Vasilakos,et al.  Computational Intelligence in Bioinformatics: SNP/Haplotype Data in Genetic Association Study for Common Diseases , 2009, IEEE Transactions on Information Technology in Biomedicine.

[22]  Tzu-Hao Wang,et al.  A genome-wide association study primer for clinicians. , 2009, Taiwanese journal of obstetrics & gynecology.

[23]  Roger W. Jelliffe,et al.  Human Genetic Variation, Population Pharmacokinetic - Dynamic Models, Bayesian Feedback Control, and Maximally Precise Individualized Drug Dosage Regimens , 2009 .

[24]  A. Singleton,et al.  Genomewide association studies and human disease. , 2009, The New England journal of medicine.

[25]  Cristian R. Munteanu,et al.  Data Mining in Complex Diseases Using Evolutionary Computation , 2009, IWANN.

[26]  Albert Y. Zomaya,et al.  A genetic ensemble approach for gene-gene interaction identification , 2010, BMC Bioinformatics.

[27]  Debasis Dash,et al.  HGVbaseG2P: a central genetic association database , 2008, Nucleic Acids Res..

[28]  Siu Cheung Hui,et al.  Exploring ant-based algorithms for gene expression data analysis , 2009, Artif. Intell. Medicine.

[29]  Vipin Kumar,et al.  Association analysis techniques for analyzing complex biological data sets , 2009, 2009 IEEE International Workshop on Genomic Signal Processing and Statistics.

[30]  Park,et al.  Open Access Research Article Identification of Type 2 Diabetes-associated Combination of Snps Using Support Vector Machine , 2022 .

[31]  Chun-Houh Chen,et al.  GAP: A graphical environment for matrix visualization and cluster analysis , 2010, Comput. Stat. Data Anal..

[32]  Sarah M. Greene,et al.  Bioinformatics: Tools to accelerate population science and disease control research. , 2010, American journal of preventive medicine.

[33]  M. Anandhavalli,et al.  Association Rule Mining in Genomics , 2010 .

[34]  S. Menon Disease genes and pathways exploration: A listing of basic bioinformatics resources , 2010 .

[35]  Jesús S. Aguilar-Ruiz,et al.  Gene association analysis: a survey of frequent pattern mining from gene expression data , 2010, Briefings Bioinform..

[36]  Alexander A. Morgan,et al.  Clinical assessment incorporating a personal genome , 2010, The Lancet.

[37]  Jason H. Moore,et al.  BIOINFORMATICS REVIEW , 2005 .

[38]  Rüdiger Kramer,et al.  3D visualization of molecular structures in the MOGADOC database , 2010 .

[39]  Haiyan Hu,et al.  Mining patterns in disease classification forests , 2010, J. Biomed. Informatics.

[40]  Ling Guo,et al.  GA-Based Data Mining Applied to Genetic Data for the Diagnosis of Complex Diseases , 2010 .

[41]  Peter R. Thom,et al.  Genome-wide association studies of cancer: principles and potential utility. , 2010, Oncology.

[42]  Mingzhu Zhang,et al.  Survey on Association Rules Mining Algorithms , 2010 .

[43]  Christine Fong,et al.  GWAS Analyzer: integrating genotype, phenotype and public annotation data for genome-wide association study analysis , 2010, Bioinform..

[44]  Michael K. Ng,et al.  SKM-SNP: SNP markers detection method , 2010, J. Biomed. Informatics.

[45]  Ramkishore Bhattacharyya,et al.  Cohesion: A concept and framework for confident association discovery with potential application in microarray mining , 2011, Appl. Soft Comput..

[46]  Ed Keedwell,et al.  Ant colony optimisation to identify genetic variant association with type 2 diabetes , 2011, Inf. Sci..