Improved branch and bound algorithm for detecting SNP-SNP interactions in breast cancer

BackgroundSingle nucleotide polymorphisms (SNPs) in genes derived from distinct pathways are associated with a breast cancer risk. Identifying possible SNP-SNP interactions in genome-wide case–control studies is an important task when investigating genetic factors that influence common complex traits; the effects of SNP-SNP interaction need to be characterized. Furthermore, observations of the complex interplay (interactions) between SNPs for high-dimensional combinations are still computationally and methodologically challenging. An improved branch and bound algorithm with feature selection (IBBFS) is introduced to identify SNP combinations with a maximal difference of allele frequencies between the case and control groups in breast cancer, i.e., the high/low risk combinations of SNPs.ResultsA total of 220 real case and 334 real control breast cancer data are used to test IBBFS and identify significant SNP combinations. We used the odds ratio (OR) as a quantitative measure to estimate the associated cancer risk of multiple SNP combinations to identify the complex biological relationships underlying the progression of breast cancer, i.e., the most likely SNP combinations. Experimental results show the estimated odds ratio of the best SNP combination with genotypes is significantly smaller than 1 (between 0.165 and 0.657) for specific SNP combinations of the tested SNPs in the low risk groups. In the high risk groups, predicted SNP combinations with genotypes are significantly greater than 1 (between 2.384 and 6.167) for specific SNP combinations of the tested SNPs.ConclusionsThis study proposes an effective high-speed method to analyze SNP-SNP interactions in breast cancer association studies. A number of important SNPs are found to be significant for the high/low risk group. They can thus be considered a potential predictor for breast cancer association.

[1]  Leah E. Mechanic,et al.  Exploring SNP‐SNP interactions and colon cancer risk using polymorphism interaction analysis , 2006, International journal of cancer.

[2]  D. Allison,et al.  Detection of gene x gene interactions in genome-wide association studies of human population data. , 2007, Human heredity.

[3]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[4]  K. Lamkey,et al.  Absence of Epistasis for Grain Yield in Elite Maize Hybrids , 2003 .

[5]  Taesung Park,et al.  Odds ratio based multifactor-dimensionality reduction method for detecting gene – gene interactions , 2006 .

[6]  B. Fridley Bayesian variable and model selection methods for genetic association studies , 2009, Genetic epidemiology.

[7]  Tao Jiang,et al.  A maximum common substructure-based algorithm for searching and predicting drug-like compounds , 2008, ISMB.

[8]  Li-Yeh Chuang,et al.  Odds ratio-based genetic algorithms for generating SNP barcodes of genotypes to predict disease susceptibility. , 2008, Omics : a journal of integrative biology.

[9]  Li-Yeh Chuang,et al.  An Improved PSO Algorithm for Generating Protective SNP Barcodes in Breast Cancer , 2012, PloS one.

[10]  Li-Yeh Chuang,et al.  Chaotic particle swarm optimization for detecting SNP–SNP interactions for CXCL12-related genes in breast cancer prevention , 2012, European journal of cancer prevention : the official journal of the European Cancer Prevention Organisation.

[11]  C. Earle,et al.  Breast biopsy patterns and outcomes in Surveillance, Epidemiology, and End Results—Medicare data , 2009, Cancer.

[12]  Tobias Müller,et al.  Identifying functional modules in protein–protein interaction networks: an integrated exact approach , 2008, ISMB.

[13]  Andreas Ziegler,et al.  On safari to Random Jungle: a fast implementation of Random Forests for high-dimensional data , 2010, Bioinform..

[14]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[15]  D. Kang,et al.  The role of TNF genetic variants and the interaction with cigarette smoking for gastric cancer risk: a nested case-control study , 2009, BMC Cancer.

[16]  Stephen J. Chanock,et al.  Polymorphism Interaction Analysis (PIA): a method for investigating complex gene-gene interactions , 2008, BMC Bioinformatics.

[17]  Holger Schwender,et al.  Identification of SNP interactions using logic regression. , 2008, Biostatistics.

[18]  Josef Kittler,et al.  Fast branch & bound algorithms for optimal feature selection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Erik Demeulemeester,et al.  Sequencing surgical cases in a day-care environment: An exact branch-and-price approach , 2009, Comput. Oper. Res..

[20]  David M. Reif,et al.  Machine Learning for Detecting Gene-Gene Interactions , 2006, Applied bioinformatics.

[21]  Cyrus R. Mehta,et al.  Computing an Exact Confidence Interval for the Common Odds Ratio in Several 2×2 Contingency Tables , 1985 .

[22]  Li-Yeh Chuang,et al.  Novel generating protective single nucleotide polymorphism barcode for breast cancer using particle swarm optimization. , 2009, Cancer epidemiology.

[23]  Chih-Jen Huang,et al.  Combinational polymorphisms of seven CXCL12-related genes are protective against breast cancer in Taiwan. , 2009, Omics : a journal of integrative biology.

[24]  K. Van Steen,et al.  Deficient host-bacteria interactions in inflammatory bowel disease? The toll-like receptor (TLR)-4 Asp299gly polymorphism is associated with Crohn’s disease and ulcerative colitis , 2004, Gut.

[25]  Hsinchun Chen,et al.  Gene Function Prediction With Gene Interaction Networks: A Context Graph Kernel Approach , 2022, IEEE Transactions on Information Technology in Biomedicine.

[26]  Xian Cheng,et al.  A recursive branch-and-bound algorithm for the rectangular guillotine strip packing problem , 2008, Comput. Oper. Res..

[27]  H. Cordell Detecting gene–gene interactions that underlie human diseases , 2009, Nature Reviews Genetics.

[28]  Vibeke Andersen,et al.  Polymorphisms in the xenobiotic transporter Multidrug Resistance 1 (MDR1) and interaction with meat intake in relation to risk of colorectal cancer in a Danish prospective case-cohort study , 2009, BMC Cancer.

[29]  Xue-wen Chen An improved branch and bound algorithm for feature selection , 2003, Pattern Recognit. Lett..

[30]  Wei Wu,et al.  MDM2 SNP309, gene-gene interaction, and tumor susceptibility: an updated meta-analysis , 2011, BMC Cancer.

[31]  N. Chaiyaratana,et al.  Variable-length haplotype construction for geneߝgene interaction studies , 2009, IEEE Engineering in Medicine and Biology Magazine.

[32]  D. Altman,et al.  The odds ratio , 2000, BMJ : British Medical Journal.

[33]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.

[34]  Qiang Yang,et al.  BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies , 2010, American journal of human genetics.

[35]  Qiang Yang,et al.  Predictive rule inference for epistatic interaction detection in genome-wide association studies , 2010, Bioinform..

[36]  P. Phillips Epistasis — the essential role of gene interactions in the structure and evolution of genetic systems , 2008, Nature Reviews Genetics.

[37]  Kerrie L. Mengersen,et al.  Methods for Identifying SNP Interactions: A Review on Variations of Logic Regression, Random Forest and Bayesian Logistic Regression , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[38]  Li-Yeh Chuang,et al.  Sequence-Based Polymorphisms in the Mitochondrial D-Loop and Potential SNP Predictors for Chronic Dialysis , 2012, PloS one.

[39]  Yang Cheng-Hong,et al.  Odds ratio-based genetic algorithms for generating SNP barcodes of genotypes to predict disease susceptibility. , 2008 .