KvSNP: accurately predicting the effect of genetic variants in voltage-gated potassium channels

MOTIVATION Non-synonymous single nucleotide polymorphisms (nsSNPs) in voltage-gated potassium (Kv) channels cause diseases with potentially fatal consequences in seemingly healthy individuals. Identifying disease-causing genetic variation will aid presymptomatic diagnosis and treatment of such disorders. NsSNP-effect predictors are hypothesized to perform best when developed for specific gene families. We, thus, created KvSNP: a method that assigns a disease-causing probability to Kv-channel nsSNPs. RESULTS KvSNP outperforms popular non gene-family-specific methods (SNPs&GO, SIFT and Polyphen) in predicting the disease potential of Kv-channel variants, according to all tested metrics (accuracy, Matthews correlation coefficient and area under receiver operator characteristic curve). Most significantly, it increases the separation of the median predicted disease probabilities between benign and disease-causing SNPs by 26% on the next-best competitor. KvSNP has ranked 172 uncharacterized Kv-channel nsSNPs by disease-causing probability. AVAILABILITY AND IMPLEMENTATION KvSNP, a WEKA implementation is available at www.bioinformatics.leeds.ac.uk/KvDB/KvSNP.html. CONTACT d.r.westhead@leeds.ac.uk SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[2]  E. Campbell,et al.  Atomic structure of a voltage-dependent K+ channel in a lipid membrane-like environment , 2007, Nature.

[3]  J. Graham,et al.  Genetic Testing for Long-QT Syndrome: Distinguishing Pathogenic Mutations From Benign Variants , 2010 .

[4]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[5]  Igor F Tsigelny,et al.  Tetramerization domain mutations in KCNA5 affect channel kinetics and cause abnormal trafficking patterns. , 2010, American journal of physiology. Cell physiology.

[6]  Not Available Not Available,et al.  Three novel KCNA1 mutations in episodic ataxia type I families , 1998, Human Genetics.

[7]  Judea Pearl,et al.  Heuristics : intelligent search strategies for computer problem solving , 1984 .

[8]  R. Touraine,et al.  Chronic neuromyotonia as a phenotypic variation associated with a new mutation in the KCNA1 gene , 2006, Journal of Neurology.

[9]  David R. Westhead,et al.  A comparative study of machine-learning methods to predict the effects of single nucleotide polymorphisms on protein function , 2003, Bioinform..

[10]  M. Gromiha,et al.  Sequence and structural analysis of binding site residues in protein-protein complexes. , 2010, International journal of biological macromolecules.

[11]  E. Accili,et al.  Evolutionary analyses of KCNQ1 and HERG voltage-gated potassium channel sequences reveal location-specific susceptibility and augmented chemical severities of arrhythmogenic mutations , 2008, BMC Evolutionary Biology.

[12]  S. Henikoff,et al.  Predicting deleterious amino acid substitutions. , 2001, Genome research.

[13]  Andrew J. Bulpitt,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btl649 Genome analysis Deleterious SNP prediction: be mindful of your training data! , 2022 .

[14]  J. Biro,et al.  Theoretical Biology and Medical Modelling Open Access Amino Acid Size, Charge, Hydropathy Indices and Matrices for Protein Structure Analysis , 2005 .

[15]  Dan M Roden,et al.  KCNH2-K897T Is a Genetic Modifier of Latent Congenital Long-QT Syndrome , 2005, Circulation.

[16]  Mark A. Hall,et al.  Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning , 1999, ICML.

[17]  Philippe Bogaerts,et al.  Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0 , 2009, Bioinform..

[18]  Brinda K Rana,et al.  Function of Kv1.5 channels and genetic variations of KCNA5 in patients with idiopathic pulmonary arterial hypertension. , 2007, American journal of physiology. Cell physiology.

[19]  C.-C. Jay Kuo,et al.  Sequence-based prioritization of nonsynonymous single-nucleotide polymorphisms for the study of disease mutations. , 2007, American journal of human genetics.

[20]  Predrag Radivojac,et al.  MutDB: update on development of tools for the biochemical analysis of genetic variation , 2007, Nucleic Acids Res..

[21]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[22]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[23]  Aleksey A. Porollo,et al.  Combining prediction of secondary structure and solvent accessibility in proteins , 2005, Proteins.

[24]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[25]  Warren C. Lathe,et al.  Prediction of deleterious human alleles. , 2001, Human molecular genetics.

[26]  Carlo Napolitano,et al.  How Really Rare Are Rare Diseases?: , 2003, Journal of cardiovascular electrophysiology.

[27]  J. Makielski,et al.  SIDS: genetic and environmental influences may cause arrhythmia in this silent killer. , 2006, The Journal of clinical investigation.

[28]  P. Bork,et al.  Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[29]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[30]  Shinichi Hirose,et al.  Genetics of epilepsy: current status and perspectives , 2002, Neuroscience Research.

[31]  P. Stenson,et al.  The Human Gene Mutation Database: 2008 update , 2009, Genome Medicine.

[32]  Prepare for the deluge , 2008, Nature Biotechnology.

[33]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[34]  The UniProt Consortium,et al.  The Universal Protein Resource (UniProt) 2009 , 2008, Nucleic Acids Res..

[35]  E. Capriotti,et al.  Functional annotations improve the predictive score of human disease‐related mutations in proteins , 2009, Human mutation.

[36]  N. Ben-Tal,et al.  Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. , 2004, Molecular biology and evolution.

[37]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[38]  Dennis R. Livesay,et al.  Improving position-specific predictions of protein functional sites using phylogenetic motifs , 2008, Bioinform..

[39]  B. Scicluna,et al.  The Primary Arrhythmia Syndromes: Same Mutation, Different Manifestations. Are We Starting to Understand Why? , 2008, Journal of cardiovascular electrophysiology.

[40]  A. Brookes The essence of SNPs. , 1999, Gene.

[41]  Stefan Kääb,et al.  Susceptibility genes and modifiers for cardiac arrhythmias. , 2005, Cardiovascular research.

[42]  David R Westhead,et al.  KvDB; mining and mapping sequence variants in voltage‐gated potassium channels , 2010, Human mutation.

[43]  Elizabeth M. Smigielski,et al.  dbSNP: the NCBI database of genetic variation , 2001, Nucleic Acids Res..

[44]  Hua Yang,et al.  Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy , 2006, BMC Bioinformatics.

[45]  Sethu Vijayakumar,et al.  ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning , 2000 .

[46]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[47]  Bernd Brinkmann,et al.  Sudden infant death syndrome and long QT syndrome: an epidemiological and genetic study , 2006, International Journal of Legal Medicine.

[48]  Michael J Ackerman,et al.  Genetic Testing for Long-QT Syndrome: Distinguishing Pathogenic Mutations From Benign Variants , 2009, Circulation.

[49]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[50]  S. Markowitz,et al.  Pharmacogenetic Considerations in Diseases of Cardiac Ion Channels , 2003, Journal of Pharmacology and Experimental Therapeutics.

[51]  Michael J Ackerman,et al.  Ethnic differences in cardiac potassium channel variants: implications for genetic susceptibility to sudden cardiac death and genetic testing for congenital long QT syndrome. , 2003, Mayo Clinic proceedings.

[52]  Michael Krawczak,et al.  The human gene mutation database , 1998, Nucleic Acids Res..

[53]  Thomas Friedrich,et al.  KCNQ4, a Novel Potassium Channel Expressed in Sensory Outer Hair Cells, Is Mutated in Dominant Deafness , 1999, Cell.

[54]  Nicholas J. Schork,et al.  Accurate prediction of deleterious protein kinase polymorphisms , 2007, Bioinform..

[55]  Kenneth H. Buetow,et al.  Large-scale analysis of non-synonymous coding region single nucleotide polymorphisms , 2004, Bioinform..

[56]  Steven Henikoff,et al.  SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[57]  A J Moss,et al.  Spectrum of Mutations in Long-QT Syndrome Genes: KVLQT1, HERG, SCN5A, KCNE1, and KCNE2 , 2000, Circulation.

[58]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[59]  J. Moult,et al.  Identification and analysis of deleterious human SNPs. , 2006, Journal of molecular biology.