Analysis of Multiple Single Nucleotide Polymorphisms of Candidate Genes Related to Coronary Heart Disease Susceptibility by Using Support Vector Machines

Abstract Coronary heart disease (CHD) is a complex genetic disease involving gene-environment interaction. Many association studies between single nucleotide polymorphisms (SNPs) of candidate genes and CHD have been reported. We have applied a new method to analyze such relationships using support vector machines (SVMs), which is one of the methods for artificial neuronal network. We assumed that common haplotype implicit in genotypes will differ between cases and controls, and that this will allow SVM-derived patterns to be classifiable according to subject genotypes. Fourteen SNPs of ten candidate genes in 86 CHD patients and 119 controls were investigated. Genotypes were transformed to a numerical vector by giving scores based on difference between the genotypes of each subject and the reference genotypes, which represent the healthy normal population. Overall classification accuracy by SVMs was 64.4% with a receiver operating characteristic (ROC) area of 0.639. By conventional analysis using the χ2 test, the association between CHD and the SNP of the scavenger receptor B1 gene was most significant in terms of allele frequencies in cases vs. controls (p = 0.0001). In conclusion, we suggest that the application of SVMs for association studies of SNPs in candidate genes shows considerable promise and that further work could be usefully performed upon the estimation of CHD susceptibility in individuals of high risk.

[1]  R. Levy,et al.  Estimation of the concentration of low-density lipoprotein cholesterol in plasma, without use of the preparative ultracentrifuge. , 1972, Clinical chemistry.

[2]  G. Niccoli,et al.  Coronary risk factors: new perspectives. , 2001, International journal of epidemiology.

[3]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[4]  G. Sanz,et al.  Angiographic Findings 1 Month After Myocardial Infarction: A Prospective Study of 259 Survivors , 1982, Circulation.

[5]  D Curtis,et al.  The effect of marker characteristics on the power to detect linkage disequilibrium due to single or multiple ancestral mutations , 2000 .

[6]  D Haussler,et al.  Knowledge-based analysis of microarray gene expression data by using support vector machines. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[7]  E. Topol Textbook of Cardiovascular Medicine , 1997 .

[8]  F R Burden,et al.  Cross-validatory selection of test and validation sets in multivariate calibration and neural networks as applied to spectroscopy. , 1997, The Analyst.

[9]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[10]  Gene Kim,et al.  Application of Support Vector Machine to detect an association between a disease or trait and multiple SNP variations , 2001, ArXiv.

[11]  N E Morton,et al.  Genetic epidemiology of single-nucleotide polymorphisms. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Alan D. Lopez,et al.  Alternative projections of mortality and disability by cause 1990–2020: Global Burden of Disease Study , 1997, The Lancet.

[13]  D Curtis,et al.  Use of an artificial neural network to detect association between a disease and multiple marker genotypes , 2001, Annals of human genetics.