论文信息 - Prioritisation of candidate Single Amino Acid Polymorphisms using one-class learning machines

Prioritisation of candidate Single Amino Acid Polymorphisms using one-class learning machines

Recent advancements of the next-generation sequencing technology have enabled the direct sequencing of rare genetic variants in both case and control individuals. Although there have been a few statistical methods for uncovering potential associations between multiple rare variants and human inherited diseases, most of these methods require computational approaches to filter out non-functional variants for the purpose of maximising the statistical power. To tackle this problem, we formulate the detection of genetic variants that are associated with a specific type of disease from the perspective of one-class novelty learning. We focus on a typical type of genetic variants called Single Amino Acid Polymorphisms (SAAPs), and we take advantages of a feature selection mechanism and two one-class learning methods to prioritise candidate SAAPs. Systematic validation demonstrates that the proposed model is effective in recovering disease-associated SAAPs.

[1] Kai Wang,et al. Pathway-based approaches for analysis of genomewide association studies. , 2007, American journal of human genetics.

[2] David G. Stork,et al. Pattern Classification (2nd ed.) , 1999 .

[3] J. Houwing-Duistermaat,et al. Genome-wide association study (GWAS)-identified disease risk alleles do not compromise human longevity , 2010, Proceedings of the National Academy of Sciences.

[4] W. Bodmer,et al. Common and rare variants in multifactorial susceptibility to common diseases , 2008, Nature Genetics.

[5] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[6] Jianmin Jiang,et al. Network Anomaly Detection Using One Class Support Vector Machine , 2008 .

[7] David Zhang,et al. Two-stage image denoising by principal component analysis with local pixel grouping , 2010, Pattern Recognit..

[8] Don R. Hush,et al. Network constraints and multi-objective optimization for one-class classification , 1996, Neural Networks.

[9] Thomas L. Madden,et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[10] Joshua M. Stuart,et al. MICROARRAY EXPERIMENTS : APPLICATION TO SPORULATION TIME SERIES , 1999 .

[11] Richa Agarwala,et al. COBALT: constraint-based alignment tool for multiple protein sequences , 2007, Bioinform..

[12] Charles Rotimi,et al. A Genome-Wide Association Study of Hypertension and Blood Pressure in African Americans , 2009, PLoS genetics.

[13] Richard Robinson,et al. Common Disease, Multiple Rare (and Distant) Variants , 2010, PLoS biology.

[14] María Martín,et al. The Universal Protein Resource (UniProt) in 2010 , 2010 .

[15] Hua Yang,et al. Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy , 2006, BMC Bioinformatics.

[16] Rui Jiang,et al. Comparative study of ensemble learning approaches in the identification of disease mutations , 2010, 2010 3rd International Conference on Biomedical Engineering and Informatics.

[17] S. Browning,et al. A Groupwise Association Test for Rare Mutations Using a Weighted Sum Statistic , 2009, PLoS genetics.

[18] Steven Henikoff,et al. SIFT: predicting amino acid changes that affect protein function , 2003, Nucleic Acids Res..

[19] Elizabeth M. Smigielski,et al. dbSNP: a database of single nucleotide polymorphisms , 2000, Nucleic Acids Res..

[20] Suzanne M. Leal,et al. A Novel Adaptive Method for the Analysis of Next-Generation Sequencing Data to Detect Complex Trait Associations with Rare Variants Due to Gene Main Effects and Interactions , 2010, PLoS genetics.

[21] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.

[22] Robert D. Finn,et al. Pfam: clans, web tools and services , 2005, Nucleic Acids Res..

[23] P. Bork,et al. Human non-synonymous SNPs: server and survey. , 2002, Nucleic acids research.

[24] Gaurav Bhatia,et al. A Covering Method for Detecting Genetic Associations between Rare Variants and Common Phenotypes , 2010, PLoS Comput. Biol..

[25] Donald F. Specht,et al. Probabilistic neural networks , 1990, Neural Networks.

[26] C.-C. Jay Kuo,et al. Sequence-based prioritization of nonsynonymous single-nucleotide polymorphisms for the study of disease mutations. , 2007, American journal of human genetics.

[27] D. Bentley,et al. Whole-genome re-sequencing. , 2006, Current opinion in genetics & development.

[28] P. Stenson,et al. Human Gene Mutation Database (HGMD , 2003 .

[29] Philip D. Wasserman,et al. Advanced methods in neural computing , 1993, VNR computer library.

[30] Mi Zhou,et al. nsSNPAnalyzer: identifying disease-associated nonsynonymous single nucleotide polymorphisms , 2005, Nucleic Acids Res..

[31] Yan P. Yuan,et al. HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources , 2002, Nucleic Acids Res..

[32] S. Leal,et al. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. , 2008, American journal of human genetics.

[33] J. Florez,et al. The genetics of type 2 diabetes: what have we learned from GWAS? , 2010, Annals of the New York Academy of Sciences.

[34] Alastair Forbes,et al. Sequence variants in the autophagy gene IRGM and multiple other replicating loci contribute to Crohn's disease susceptibility , 2007, Nature Genetics.

[35] D. Cooper,et al. Human Gene Mutation Database , 1996, Human Genetics.

[36] Mark M Iles,et al. What Can Genome-Wide Association Studies Tell Us about the Genetics of Common Disease , 2008, PLoS genetics.

[37] Luis Mateus Rocha,et al. Singular value decomposition and principal component analysis , 2003 .

[38] A. Negi,et al. Positive association of common variants in CD36 with neovascular age-related macular degeneration , 2009, Aging.

[39] C. Hoggart,et al. Pathway Analysis of GWAS Provides New Insights into Genetic Susceptibility to 3 Inflammatory Diseases , 2009, PloS one.

[40] Robert P. W. Duin,et al. Support vector domain description , 1999, Pattern Recognit. Lett..

[41] Igor I Baskin,et al. The One‐Class Classification Approach to Data Description and to Models Applicability Domain , 2010, Molecular informatics.

[42] Bernhard Schölkopf,et al. Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[43] M. Spitz,et al. Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. , 2008, American journal of human genetics.

[44] Donald F. Specht,et al. Probabilistic neural networks and the polynomial Adaline as complementary techniques for classification , 1990, IEEE Trans. Neural Networks.