Developing a robust colorectal cancer (CRC) risk predictive model with the big genetic and environment related CRC data

Currently, colorectal cancer (CRC) already becomes one of the most common cancers worldwide. Though the prognosis of CRC patients is dramatically improved due to the new advanced treatments and medical improvements, the 5-year survival rate for the CRC patient is still low. Thus, we hypothesize that CRC may result from the complicated reasons related to both genetic and environmental factors. For this reason, this study collects such big CRC data with information of genetic variations and environmental exposure for the CRC patients and cancer-free controls that are employed to train and test the predictive CRC model. Our results demonstrate that (1) the explored genetic and environmental biomarkers are validated to cause the CRC by the manually reviewed experimental evidences, (2) the model can efficiently predict the risk of CRC after parameter optimization by the big CRC-related data, (3) our innovated generalized kernel recursive maximum correntropy(GKRMC) algorithm has high predictive power. Finally, we discuss why the GKRMC can outperform the classical regression algorithms and the related future study.

[1]  Hideo Tanaka,et al.  Association between an 8q24 locus and the risk of colorectal cancer in Japanese , 2009, BMC Cancer.

[2]  Y. Kamatani,et al.  Common variant in 6q26-q27 is associated with distal colon cancer in an Asian population , 2011, Gut.

[3]  Afsaneh Barzi,et al.  Molecular Pathways Molecular Pathways : Estrogen Pathway inColorectal Cancer , 2013 .

[4]  Keitaro Matsuo,et al.  Dietary intake of folate and alcohol, MTHFR C677T polymorphism, and colorectal cancer risk in Korea. , 2012, The American journal of clinical nutrition.

[5]  Juan Li,et al.  Association of Genetic Polymorphisms in HSD17B1, HSD17B2 and SHBG Genes with Hepatocellular Carcinoma Risk , 2014, Pathology & Oncology Research.

[6]  Nanning Zheng,et al.  Generalized Correntropy for Robust Adaptive Filtering , 2015, IEEE Transactions on Signal Processing.

[7]  M. Taqqu,et al.  Stable Non-Gaussian Random Processes : Stochastic Models with Infinite Variance , 1995 .

[8]  William Shannon,et al.  Detecting epistatic interactions contributing to quantitative traits , 2004, Genetic epidemiology.

[9]  A. Jemal,et al.  Global Cancer Statistics , 2011 .

[10]  Leah E. Mechanic,et al.  A Review of NCI's Extramural Grant Portfolio: Identifying Opportunities for Future Research in Genes and Environment in Cancer , 2013, Cancer Epidemiology, Biomarkers & Prevention.

[11]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[12]  Lynne R. Wilkens,et al.  Genetic polymorphisms in the estrogen receptor beta (ESR2) gene and the risk of epithelial ovarian carcinoma , 2009, Cancer Causes & Control.

[13]  S Stanley Young,et al.  Re: Low-fat dietary pattern and cancer incidence in the Women's Health Initiative Dietary Modification Randomized Controlled Trial. , 2008, Journal of the National Cancer Institute.

[14]  M. Waly,et al.  Dietary and lifestyle characteristics of colorectal cancer in Jordan: a case-control study. , 2014, Asian Pacific journal of cancer prevention : APJCP.

[15]  H. Abdi,et al.  Principal component analysis , 2010 .

[16]  A J Kermond,et al.  How can we reduce the incidence and mortality of colorectal cancer? , 1997, The Medical journal of Australia.

[17]  V. Devita,et al.  Two hundred years of cancer research. , 2012, The New England journal of medicine.

[18]  Meng Zhang,et al.  Incidence and mortality of colorectal cancer in China, 2011. , 2015, Chinese journal of cancer research = Chung-kuo yen cheng yen chiu.

[19]  Keitaro Matsuo,et al.  Dietary Risks: Folate, Alcohol and Gene Polymorphisms , 2012 .

[20]  Jason H. Moore,et al.  Ideal discrimination of discrete clinical endpoints using multilocus genotypes , 2004, Silico Biol..

[21]  Yijun Sun,et al.  Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Qing Lu,et al.  Detecting genetic interactions for quantitative traits with U‐statistics , 2011, Genetic epidemiology.

[23]  J. Herman,et al.  Colorectal cancer epigenetics: complex simplicity. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[24]  Johanna W Lampe,et al.  Long-Chain Omega-3 Polyunsaturated Fatty Acid Intake and Risk of Colorectal Cancer , 2014, Nutrition and cancer.

[25]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[26]  Ahmedin Jemal,et al.  International Trends in Colorectal Cancer Incidence Rates , 2009, Cancer Epidemiology Biomarkers & Prevention.

[27]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[28]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[29]  Gudmundur A. Thorisson,et al.  The International HapMap Project Web site. , 2005, Genome research.

[30]  D. Parkin,et al.  International variations in the incidence of childhood bone tumours , 1993, International journal of cancer.

[31]  Ian D. Gates,et al.  A support vector machine algorithm to classify lithofacies and model permeability in heterogeneous reservoirs , 2010 .

[32]  A. Jemal,et al.  Global cancer statistics , 2011, CA: a cancer journal for clinicians.

[33]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[34]  Qiang Wang,et al.  Genetic polymorphisms of DNA repair genes XRCC1 and XRCC3 and risk of colorectal cancer in Chinese population. , 2012, Asian Pacific journal of cancer prevention : APJCP.

[35]  Badong Chen,et al.  Quantized Kernel Least Mean Square Algorithm , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[36]  B. Rosner,et al.  Use of the Mann-Whitney U-test for clustered data. , 1999, Statistics in medicine.

[37]  S. Fakhry,et al.  The conundrum of the Glasgow Coma Scale in intubated patients: a linear regression prediction of the Glasgow verbal score from the Glasgow eye and motor scores. , 1998, The Journal of trauma.

[38]  Anette Hjartåker,et al.  Body mass index, physical activity, and colorectal cancer by anatomical subsites: a systematic review and meta-analysis of cohort studies , 2013, European journal of cancer prevention : the official journal of the European Cancer Prevention Organisation.

[39]  Huan Yang,et al.  A Novel Polymorphism rs1329149 of CYP2E1 and a Known Polymorphism rs671 of ALDH2 of Alcohol Metabolizing Enzymes Are Associated with Colorectal Cancer in a Southwestern Chinese Population , 2009, Cancer Epidemiology, Biomarkers & Prevention.

[40]  Lars Alfredsson,et al.  Specific interaction between genotype, smoking and autoimmunity to citrullinated α-enolase in the etiology of rheumatoid arthritis , 2009, Nature Genetics.

[41]  Yi Wang,et al.  Exploration of gene–gene interaction effects using entropy-based methods , 2008, European Journal of Human Genetics.

[42]  S. Fakhry,et al.  Appropriate use of the Glasgow Coma Scale in intubated patients: a linear regression prediction of the Glasgow verbal score from the Glasgow eye and motor scores. , 1996, The Journal of trauma.

[43]  John Mark,et al.  Introduction to radial basis function networks , 1996 .

[44]  Lian Wee Ler,et al.  Role of genetic & environment risk factors in the aetiology of colorectal cancer in Malaysia , 2014, The Indian journal of medical research.

[45]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[46]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[47]  Weifeng Liu,et al.  Kernel Adaptive Filtering: A Comprehensive Introduction , 2010 .

[48]  Hailiang Huang,et al.  Gene-Based Tests of Association , 2011, PLoS genetics.

[49]  Jing-nan Sun,et al.  Entropy method for determination of weight of evaluating indicators in fuzzy synthetic evaluation for water quality assessment. , 2006, Journal of environmental sciences.

[50]  Jun Liu,et al.  Association Between Consumption of Fruits and Vegetables and Risk of Colorectal Adenoma , 2015, Medicine.

[51]  David Reich,et al.  A common genetic risk factor for colorectal and prostate cancer , 2007, Nature Genetics.

[52]  Takeshi Suzuki,et al.  Dietary Risk Factors for Colon and Rectal Cancers: A Comparative Case-Control Study , 2006, Journal of epidemiology.

[53]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[54]  J. Hardcastle,et al.  Colorectal cancer , 1993, Europe Against Cancer European Commission Series for General Practitioners.

[55]  Simon Ferrier,et al.  Evaluating the predictive performance of habitat models developed using logistic regression , 2000 .

[56]  Margaret R Karagas,et al.  Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking and bladder cancer susceptibility. , 2006, Carcinogenesis.

[57]  Badong Chen,et al.  Quantized Kernel Recursive Least Squares Algorithm , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[58]  Huan Yang,et al.  Application of Crossover Analysis-logistic Regression in the Assessment of Gene- environmental Interactions for Colorectal Cancer. , 2012, Asian Pacific journal of cancer prevention : APJCP.

[59]  S. Haykin,et al.  Kernel Least‐Mean‐Square Algorithm , 2010 .

[60]  A. Chan,et al.  Nutrients, foods, and colorectal cancer prevention. , 2015, Gastroenterology.

[61]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[62]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[63]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[64]  M. Bertagnolli,et al.  Molecular origins of cancer: Molecular basis of colorectal cancer. , 2009, The New England journal of medicine.

[65]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[66]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[67]  Kazuo Tajima,et al.  Development of a semi-quantitative food frequency questionnaire to determine variation in nutrient intakes between urban and rural areas of Chongqing, China. , 2004, Asia Pacific journal of clinical nutrition.

[68]  Weifeng Liu,et al.  Kernel Adaptive Filtering , 2010 .

[69]  Jin Gu,et al.  Changing patterns of colorectal cancer in China over a period of 20 years. , 2005, World journal of gastroenterology.

[70]  Chris Mattmann,et al.  Computing: A vision for data science , 2013, Nature.

[71]  Jacob Rosenberg,et al.  Colorectal carcinogenesis--update and perspectives. , 2014, World journal of gastroenterology.

[72]  Omar De la Cruz,et al.  Population structure at different minor allele frequency levels , 2014, BMC Proceedings.