A multivariate regression approach for identification of SNPs importance in prostate cancer

ABSTRACT Nowadays, it is well-known that there are several genetic alterations that can be employed as genetic markers of prostate cancer (PCa). The use of single nucleotide polymorphism (SNP) is one of the most promising areas of research in cancer investigation. The aim of the present research is to study the influence of the pathways with the help of models such as recursive partitioning method, to detect the SNP of relevance, and consequently the detection of PCa. Data are retrieved from MCC-Spain database, selecting cases and controls as a heterogeneous group. Recursive partitioning method decision trees allow to prune off the splits that are supposed to be not of interest. Then, with the selected pathways, multivariate adaptive regression spline models are trained, and its performance is assessed in terms of the Area Under Curve (AUC) of the Receiver Operating Characteristics (ROC) curve. As a result, with performance tests for researchers that work with genetic datasets, a dimensional reduction and tuning of the parameters for the models are determined. In the case of our research, a total of 12 SNPs were found as the most relevant of the above-mentioned database for the PCa detection.

[1]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[2]  Fernando Sánchez Lasheras,et al.  PoDA Algorithm: Predictive Pathways in Colorectal Cancer , 2017, SOCO-CISIS-ICEUTE.

[3]  P. J. García Nieto,et al.  Using multivariate adaptive regression splines and multilayer perceptron networks to evaluate paper manufactured using Eucalyptus globulus , 2012, Appl. Math. Comput..

[4]  Heping Zhang,et al.  A nonparametric regression method for multiple longitudinal phenotypes using multivariate adaptive splines , 2013, Frontiers of mathematics in China : selected papers from Chinese universities.

[5]  Karl-Heinz Jöckel,et al.  Genome-wide association study of classical Hodgkin lymphoma identifies key regulators of disease susceptibility , 2017, Nature Communications.

[6]  M. LeBlanc,et al.  Relative risk trees for censored survival data. , 1992, Biometrics.

[7]  Mac McKee,et al.  Recursive partitioning techniques for modeling irrigation behavior , 2013, Environ. Model. Softw..

[8]  P. J. García Nieto,et al.  Forecasting the cyanotoxins presence in fresh waters: A new model based on genetic algorithms combined with the MARS technique , 2013 .

[9]  Francisco Javier de Cos Juez,et al.  A hybrid device for the solution of sampling bias problems in the forecasting of firms' bankruptcy , 2012, Expert Syst. Appl..

[10]  Francisco Javier de Cos Juez,et al.  Missing data imputation of questionnaires by means of genetic algorithms with different fitness functions , 2017, J. Comput. Appl. Math..

[11]  Feliksas Jankevicius,et al.  Frequent down-regulation of ABC transporter genes in prostate cancer , 2015, BMC Cancer.

[12]  John A. Weymark,et al.  GENERALIZED GIN 1 INEQUALITY INDICES , 2001 .

[13]  Ali Amin Al Olama,et al.  The genetic epidemiology of prostate cancer and its clinical implications , 2014, Nature Reviews Urology.

[14]  M. Fujii,et al.  G-protein γ subunit GNG11 strongly regulates cellular senescence , 2006 .

[15]  G. Hellemann,et al.  Seeing the trees despite the forest: applying recursive partitioning to the evaluation of drug treatment retention. , 2009, Journal of substance abuse treatment.

[16]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[17]  H. D. de Koning,et al.  Is prostate cancer different in black men? Answers from 3 natural history models , 2017, Cancer.

[18]  P. Lansky The Importance of Being Digital , 2013 .

[19]  G. Rousset,et al.  Open-loop tomography with artificial neural networks on CANARY: on-sky results , 2014, 1405.6862.

[20]  A. Hofman,et al.  Quantification of the smoking-associated cancer risk with rate advancement periods: meta-analysis of individual participant data from cohorts of the CHANCES consortium , 2016, BMC Medicine.

[21]  Heping Zhang,et al.  A genome-wide association analysis of Framingham Heart Study longitudinal data using multivariate adaptive splines , 2009, BMC proceedings.

[22]  C. Mathers,et al.  Cancer incidence and mortality worldwide: Sources, methods and major patterns in GLOBOCAN 2012 , 2015, International journal of cancer.

[23]  M. Province,et al.  Using Tree‐Based Recursive Partitioning Methods to Group Haplotypes for Increased Power in Association Studies , 2005, Annals of human genetics.

[24]  A. Schrader,et al.  Influence of serum cholesterol level and statin treatment on prostate cancer aggressiveness , 2017, Oncotarget.

[25]  Dani Guzman,et al.  Using artificial neural networks for open-loop tomography. , 2011, Optics express.

[26]  Rebecca A. O'Leary,et al.  Classification and Regression Tree and Spatial Analyses Reveal Geographic Heterogeneity in Genome Wide Linkage Study of Indian Visceral Leishmaniasis , 2010, PloS one.

[27]  M. Ittmann,et al.  Increased expression of fibroblast growth factor 6 in human prostatic intraepithelial neoplasia and prostate cancer. , 2000, Cancer research.

[28]  J. Kaprio,et al.  Environmental and heritable factors in the causation of cancer--analyses of cohorts of twins from Sweden, Denmark, and Finland. , 2000, The New England journal of medicine.

[29]  T. Therneau,et al.  An Introduction to Recursive Partitioning Using the RPART Routines , 2015 .

[30]  M. Fujii,et al.  G-protein gamma subunit GNG11 strongly regulates cellular senescence. , 2006, Biochemical and biophysical research communications.

[31]  Vicente Martín,et al.  Population-based multicase-control study in common tumors in Spain (MCC-Spain): rationale and study design. , 2015, Gaceta sanitaria.

[32]  P. J. García Nieto,et al.  Support Vector Machines and Multilayer Perceptron Networks Used to Evaluate the Cyanotoxins Presence from Experimental Cyanobacteria Concentrations in the Trasona Reservoir (Northern Spain) , 2013 .

[33]  P. J. García Nieto,et al.  Prediction of work-related accidents according to working conditions using support vector machines , 2011, Appl. Math. Comput..