A Novel Approach to Detecting Epistasis using Random Sampling Regularisation

Epistasis is a progressive approach that complements the ‘common disease, common variant’ hypothesis that highlights the potential for connected networks of genetic variants collaborating to produce a phenotypic expression. Epistasis is commonly performed as a pairwise or limitless-arity capacity that considers variant networks as either variant vs variant or as high order interactions. This type of analysis extends the number of tests that were previously performed in a standard approach such as Genome-Wide Association Study (GWAS), in which False Discovery Rate (FDR) is already an issue, therefore by multiplying the number of tests up to a factorial rate also increases the issue of FDR. Further to this, epistasis introduces its own limitations of computational complexity and intensity that are generated based on the analysis performed; to consider the most intense approach, a multivariate analysis introduces a time complexity of O(n!). Proposed in this paper is a novel methodology for the detection of epistasis using interpretable methods and best practice to outline interactions through filtering processes. Using a process of Random Sampling Regularisation which randomly splits and produces sample sets to conduct a voting system to regularise the significance and reliability of biological markers, SNPs. Preliminary results are promising, outlining a concise detection of interactions. Results for the detection of epistasis, in the classification of breast cancer patients, indicated eight outlined risk candidate interactions from five variants and a singular candidate variant with high protective association.

[1]  J. Katz,et al.  “One Size Fits All” Doesn’t Fit When It Comes to Long-Term Opioid Use for People with Chronic Pain , 2017, Canadian journal of pain = Revue canadienne de la douleur.

[2]  Simon C. K. Shiu,et al.  Molecular Pattern Discovery Based on Penalized Matrix Decomposition , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Lei Zhang,et al.  Prediction of protein-protein interactions based on protein-protein correlation using least squares regression. , 2014, Current protein & peptide science.

[4]  De-Shuang Huang,et al.  Direct AUC optimization of regulatory motifs , 2017, Bioinform..

[5]  P. Donnelly,et al.  Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip , 2009, PLoS genetics.

[6]  P. Visscher,et al.  Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits , 2012, Nature Genetics.

[7]  J. Chester,et al.  Personalised cancer medicine , 2015, International journal of cancer.

[8]  De-Shuang Huang,et al.  A Two-Stage Geometric Method for Pruning Unreliable Links in Protein-Protein Networks , 2015, IEEE Transactions on NanoBioscience.

[9]  Wei Dong,et al.  Association between two CHRNA3 variants and susceptibility of lung cancer: a meta-analysis , 2016, Scientific Reports.

[10]  M. Cloitre The “one size fits all” approach to trauma treatment: should we be satisfied? , 2015, European journal of psychotraumatology.

[11]  R. Elston,et al.  Identification of gene‐gene interactions in the presence of missing data using the multifactor dimensionality reduction method , 2009, Genetic epidemiology.

[12]  Xiaobo Zhou,et al.  Nonconvex Penalty Based Low-Rank Representation and Sparse Regression for eQTL Mapping , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[13]  De-Shuang Huang,et al.  FAACOSE: A Fast Adaptive Ant Colony Optimization Algorithm for Detecting SNP Epistasis , 2017, Complex..

[14]  Xing-Ming Zhao,et al.  Understanding tissue-specificity with human tissue-specific regulatory networks , 2016, Science China Information Sciences.

[15]  L. Galluzzi,et al.  Pathophysiology of Cancer Cell Death , 2020 .

[16]  Cisca Wijmenga,et al.  From genome-wide association studies to disease mechanisms: celiac disease as a model for autoimmune diseases , 2012, Seminars in Immunopathology.

[17]  De-Shuang Huang,et al.  ChIP-PIT: Enhancing the Analysis of ChIP-Seq Data Using Convex-Relaxed Pair-Wise Interaction Tensor Decomposition , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[18]  De-Shuang Huang,et al.  Mining the bladder cancer-associated genes by an integrated strategy for the construction and analysis of differential co-expression networks , 2015, BMC Genomics.

[19]  De-Shuang Huang,et al.  Identifying Stages of Kidney Renal Cell Carcinoma by Combining Gene Expression and DNA Methylation Data , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[20]  C. Myers,et al.  Pathway-based discovery of genetic interactions in breast cancer , 2017, PLoS genetics.

[21]  P. Robson,et al.  Assessing SNP-SNP Interactions among DNA Repair, Modification and Metabolism Related Pathway Genes in Breast Cancer Susceptibility , 2013, PloS one.

[22]  Zhu-Hong You,et al.  Using manifold embedding for assessing and predicting protein interactions from high-throughput experimental data , 2010, Bioinform..

[23]  S. Seal,et al.  Localization of a breast cancer susceptibility gene, BRCA2, to chromosome 13q12-13. , 1994, Science.

[24]  R. Bold,et al.  Apoptosis, cancer and cancer therapy. , 1997, Surgical oncology.

[25]  Xiaofeng Wang,et al.  An efficient local Chan-Vese model for image segmentation , 2010, Pattern Recognit..

[26]  Yang Zhao,et al.  Statistical analysis for genome-wide association study , 2014, Journal of biomedical research.

[27]  De-Shuang Huang,et al.  A General CPL-AdS Methodology for Fixing Dynamic Parameters in Dual Environments , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[28]  P. Visscher,et al.  10 Years of GWAS Discovery: Biology, Function, and Translation. , 2017, American journal of human genetics.

[29]  Simon C. K. Shiu,et al.  Metasample-Based Sparse Representation for Tumor Classification , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[30]  Andrew P Morris,et al.  Basic statistical analysis in genetic case-control studies , 2011, Nature Protocols.

[31]  Stephen Eyre,et al.  Genetics of rheumatoid arthritis: GWAS and beyond , 2011, Open access rheumatology : research and reviews.

[32]  Rediet Abebe,et al.  Breast Cancer Screening, Incidence, and Mortality Across US Counties. , 2015, JAMA internal medicine.

[33]  Yuehua Cui,et al.  Send Orders of Reprints at Reprints@benthamscience.net Gene-based Genomewide Association Analysis: a Comparison Study , 2022 .

[34]  De-Shuang Huang,et al.  Normalized Feature Vectors: A Novel Alignment-Free Sequence Comparison Method Based on the Numbers of Adjacent Amino Acids , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[35]  Zhen Wang,et al.  SFAPS: An R package for structure/function analysis of protein sequences based on informational spectrum method , 2013, 2013 IEEE International Conference on Bioinformatics and Biomedicine.

[36]  De-Shuang Huang,et al.  Improved performance in protein secondary structure prediction by combining multiple predictions. , 2006, Protein and peptide letters.

[37]  Xing-Ming Zhao,et al.  Classifying protein sequences using hydropathy blocks , 2006, Pattern Recognit..

[38]  De-Shuang Huang,et al.  Predicting Hub Genes Associated with Cervical Cancer through Gene Co-Expression Networks , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[39]  De-Shuang Huang,et al.  Independent component analysis-based penalized discriminant method for tumor classification using gene expression data , 2006, Bioinform..

[40]  Xiaobo Zhou,et al.  A semi-supervised learning approach to predict synthetic genetic interactions by combining functional and topological properties of functional gene network , 2010, BMC Bioinformatics.

[41]  Kenneth G. C. Smith,et al.  Genome‐wide association studies in Crohn's disease: Past, present and future , 2018, Clinical & translational immunology.

[42]  Andries T Marees,et al.  A tutorial on conducting genome‐wide association studies: Quality control and statistical analysis , 2018, International journal of methods in psychiatric research.

[43]  Zhu-Hong You,et al.  Identifying Spurious Interactions in the Protein-Protein Interaction Networks Using Local Similarity Preserving Embedding , 2017, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[44]  De-Shuang Huang,et al.  An efficient method to transcription factor binding sites imputation via simultaneous completion of multiple matrices with positional consistency. , 2017, Molecular bioSystems.

[45]  T. Manolio,et al.  How to Interpret a Genome-wide Association Study Topic Collections , 2022 .

[46]  De-Shuang Huang,et al.  Locally linear discriminant embedding: An efficient method for face recognition , 2008, Pattern Recognit..

[47]  C. Mathers,et al.  Cancer incidence and mortality worldwide: Sources, methods and major patterns in GLOBOCAN 2012 , 2015, International journal of cancer.

[48]  M. King,et al.  Linkage of early-onset familial breast cancer to chromosome 17q21. , 1990, Science.

[49]  A. Morris,et al.  Data quality control in genetic case-control association studies , 2010, Nature Protocols.

[50]  Hongbo Zhang,et al.  WSMD: weakly-supervised motif discovery in transcription factor ChIP-seq data , 2017, Scientific Reports.

[51]  Peter Kraft,et al.  Genetic risk prediction--are we there yet? , 2009, The New England journal of medicine.

[52]  Xing-Ming Zhao,et al.  APIS: accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility , 2010, BMC Bioinformatics.

[53]  Lei Zhang,et al.  Tumor Clustering Using Nonnegative Matrix Factorization With Gene Selection , 2009, IEEE Transactions on Information Technology in Biomedicine.

[54]  A. Jemal,et al.  Breast Cancer Statistics , 2013 .

[55]  L. Korde,et al.  Genetics of breast cancer: a topic in evolution. , 2015, Annals of oncology : official journal of the European Society for Medical Oncology.

[56]  M. King,et al.  Population-based screening for breast and ovarian cancer risk due to BRCA1 and BRCA2 , 2014, Proceedings of the National Academy of Sciences.

[57]  Karen L. Mohlke,et al.  Genetic Risk Prediction — Are We There Yet? , 2009 .

[58]  Kyungsook Han,et al.  miRNA-Disease Association Prediction with Collaborative Matrix Factorization , 2017, Complex..

[59]  Xingming Zhao,et al.  Predicting protein–protein interactions from protein sequences using meta predictor , 2010, Amino Acids.

[60]  H. Lodish,et al.  Protein Sorting: Organelle Biogenesis and Protein Secretion , 2000 .