Building Up a Robust Risk Mathematical Platform to Predict Colorectal Cancer

Colorectal cancer (CRC), as a result of a multistep process and under multiple factors, is one of the most common life-threatening cancers worldwide. To identify the “high risk” populations is critical for early diagnosis and improvement of overall survival rate. Of the complicated genetic and environmental factors, which group is mostly concerning colorectal carcinogenesis remains contentious. For this reason, this study collects relatively complete information of genetic variations and environmental exposure for both CRC patients and cancer-free controls; a multimethod ensemble model for CRC-risk prediction is developed by employing such big data to train and test the model. Our results demonstrate that (1) the explored genetic and environmental biomarkers are validated to connect to the CRC by biological function- or population-based evidences, (2) the model can efficiently predict the risk of CRC after parameter optimization by the big CRC-related data, and (3) our innovated heterogeneous ensemble learning model (HELM) and generalized kernel recursive maximum correntropy (GKRMC) algorithm have high prediction power. Finally, we discuss why the HELM and GKRMC can outperform the classical regression algorithms and related subjects for future study.

[1]  Margaret R Karagas,et al.  Concordance of multiple analytical approaches demonstrates a complex relationship between DNA repair gene SNPs, smoking and bladder cancer susceptibility. , 2006, Carcinogenesis.

[2]  Jacob Rosenberg,et al.  Colorectal carcinogenesis--update and perspectives. , 2014, World journal of gastroenterology.

[3]  Lynne R. Wilkens,et al.  Genetic polymorphisms in the estrogen receptor beta (ESR2) gene and the risk of epithelial ovarian carcinoma , 2009, Cancer Causes & Control.

[4]  Omar De la Cruz,et al.  Population structure at different minor allele frequency levels , 2014, BMC Proceedings.

[5]  K. A. Abdul Nazeer,et al.  Identifying epigenetically dysregulated pathways from pathway-pathway interaction networks , 2016, Comput. Biol. Medicine.

[6]  Xiaoyu He,et al.  Exploring the key genes and signaling transduction pathways related to the survival time of glioblastoma multiforme patients by a novel survival analysis model , 2016, BMC Genomics.

[7]  Juan Li,et al.  Association of Genetic Polymorphisms in HSD17B1, HSD17B2 and SHBG Genes with Hepatocellular Carcinoma Risk , 2014, Pathology & Oncology Research.

[8]  Mark Daly,et al.  Haploview: analysis and visualization of LD and haplotype maps , 2005, Bioinform..

[9]  Shie Mannor,et al.  The kernel recursive least-squares algorithm , 2004, IEEE Transactions on Signal Processing.

[10]  Badong Chen,et al.  Quantized Kernel Recursive Least Squares Algorithm , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[11]  Bin Hu,et al.  Investigation of mechanism of bone regeneration in a porous biodegradable calcium phosphate (CaP) scaffold by a combination of a multi-scale agent-based model and experimental optimization/validation. , 2016, Nanoscale.

[12]  Gudmundur A. Thorisson,et al.  The International HapMap Project Web site. , 2005, Genome research.

[13]  J. H. Moore,et al.  Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. , 2001, American journal of human genetics.

[14]  M. Bertagnolli,et al.  Molecular origins of cancer: Molecular basis of colorectal cancer. , 2009, The New England journal of medicine.

[15]  A. Jemal,et al.  Global Cancer Statistics , 2011 .

[16]  Badong Chen,et al.  Developing a robust colorectal cancer (CRC) risk predictive model with the big genetic and environment related CRC data , 2016, 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[17]  I. Lambrinoudaki,et al.  Authors' reply: Communicating evidence-based practice in menopause , 2015, Nature Reviews Disease Primers.

[18]  D. Parkin,et al.  International variations in the incidence of childhood bone tumours , 1993, International journal of cancer.

[19]  A. Chan,et al.  Nutrients, foods, and colorectal cancer prevention. , 2015, Gastroenterology.

[20]  J. Mark Introduction to radial basis function networks , 1996 .

[21]  Jason H. Moore,et al.  Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions , 2003, Bioinform..

[22]  Ian T. Jolliffe,et al.  Principal Component Analysis , 2002, International Encyclopedia of Statistical Science.

[23]  Y. Kamatani,et al.  Common variant in 6q26-q27 is associated with distal colon cancer in an Asian population , 2011, Gut.

[24]  Ian D. Gates,et al.  A support vector machine algorithm to classify lithofacies and model permeability in heterogeneous reservoirs , 2010 .

[25]  A. Berlinet,et al.  Reproducing kernel Hilbert spaces in probability and statistics , 2004 .

[26]  Huan Yang,et al.  A Novel Polymorphism rs1329149 of CYP2E1 and a Known Polymorphism rs671 of ALDH2 of Alcohol Metabolizing Enzymes Are Associated with Colorectal Cancer in a Southwestern Chinese Population , 2009, Cancer Epidemiology, Biomarkers & Prevention.

[27]  Meng Zhang,et al.  Incidence and mortality of colorectal cancer in China, 2011. , 2015, Chinese journal of cancer research = Chung-kuo yen cheng yen chiu.

[28]  J. Herman,et al.  Colorectal cancer epigenetics: complex simplicity. , 2011, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[29]  Y. Lee,et al.  Comparison of genetic polymorphisms of CYP2E1, ADH2, and ALDH2 genes involved in alcohol metabolism in Koreans and four other ethnic groups , 2009, Journal of clinical pharmacy and therapeutics.

[30]  Kazuo Tajima,et al.  Development of a semi-quantitative food frequency questionnaire to determine variation in nutrient intakes between urban and rural areas of Chongqing, China. , 2004, Asia Pacific journal of clinical nutrition.

[31]  Qiang Wang,et al.  Genetic polymorphisms of DNA repair genes XRCC1 and XRCC3 and risk of colorectal cancer in Chinese population. , 2012, Asian Pacific journal of cancer prevention : APJCP.

[32]  Shixin Yu,et al.  Feature Selection and Classifier Ensembles: A Study on Hyperspectral Remote Sensing Data , 2003 .

[33]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[34]  Shaoxiang Zhang,et al.  Using game theory to investigate the epigenetic control mechanisms of embryo development: Comment on: "Epigenetic game theory: How to compute the epigenetic control of maternal-to-zygotic transition" by Qian Wang et al. , 2017, Physics of life reviews.

[35]  Lian Wee Ler,et al.  Role of genetic & environment risk factors in the aetiology of colorectal cancer in Malaysia , 2014, The Indian journal of medical research.

[36]  Lars Alfredsson,et al.  Specific interaction between genotype, smoking and autoimmunity to citrullinated α-enolase in the etiology of rheumatoid arthritis , 2009, Nature Genetics.

[37]  J. Long,et al.  Allelic variation at alcohol metabolism genes (ADH1B, ADH1C, ALDH2) and alcohol dependence in an American Indian population , 2003, Human Genetics.

[38]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[39]  A. Jemal,et al.  Global cancer statistics, 2012 , 2015, CA: a cancer journal for clinicians.

[40]  Xing-Ming Zhao,et al.  Identifying dysregulated pathways in cancers from pathway interaction networks , 2012, BMC Bioinformatics.

[41]  S Stanley Young,et al.  Re: Low-fat dietary pattern and cancer incidence in the Women's Health Initiative Dietary Modification Randomized Controlled Trial. , 2008, Journal of the National Cancer Institute.

[42]  M. Waly,et al.  Dietary and lifestyle characteristics of colorectal cancer in Jordan: a case-control study. , 2014, Asian Pacific journal of cancer prevention : APJCP.

[43]  Weifeng Liu,et al.  Kernel Adaptive Filtering: A Comprehensive Introduction , 2010 .

[44]  Xiaobo Zhou,et al.  Developing a multiscale, multi-resolution agent-based brain tumor model by graphics processing units , 2011, Theoretical Biology and Medical Modelling.

[45]  Hailiang Huang,et al.  Gene-Based Tests of Association , 2011, PLoS genetics.

[46]  Badong Chen,et al.  Quantized Kernel Least Mean Square Algorithm , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Xiaobo Zhou,et al.  Novel 3D GPU based numerical parallel diffusion algorithms in cylindrical coordinates for health care simulation , 2015, Math. Comput. Simul..

[48]  H. Abdi,et al.  Principal component analysis , 2010 .

[49]  V. Devita,et al.  Two hundred years of cancer research. , 2012, The New England journal of medicine.

[50]  S. Fakhry,et al.  The conundrum of the Glasgow Coma Scale in intubated patients: a linear regression prediction of the Glasgow verbal score from the Glasgow eye and motor scores. , 1998, The Journal of trauma.

[51]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[52]  Anette Hjartåker,et al.  Body mass index, physical activity, and colorectal cancer by anatomical subsites: a systematic review and meta-analysis of cohort studies , 2013, European journal of cancer prevention : the official journal of the European Cancer Prevention Organisation.

[53]  Huan Yang,et al.  Application of Crossover Analysis-logistic Regression in the Assessment of Gene- environmental Interactions for Colorectal Cancer. , 2012, Asian Pacific journal of cancer prevention : APJCP.

[54]  Philip S. Yu,et al.  Top 10 algorithms in data mining , 2007, Knowledge and Information Systems.

[55]  Xiaobo Zhou,et al.  Characterization of p38 MAPK isoforms for drug resistance study using systems biology approach , 2014, Bioinform..

[56]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[57]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[58]  Jing-nan Sun,et al.  Entropy method for determination of weight of evaluating indicators in fuzzy synthetic evaluation for water quality assessment. , 2006, Journal of environmental sciences.

[59]  Jun Liu,et al.  Association Between Consumption of Fruits and Vegetables and Risk of Colorectal Adenoma , 2015, Medicine.

[60]  David Reich,et al.  A common genetic risk factor for colorectal and prostate cancer , 2007, Nature Genetics.

[61]  Takeshi Suzuki,et al.  Dietary Risk Factors for Colon and Rectal Cancers: A Comparative Case-Control Study , 2006, Journal of epidemiology.

[62]  R. Chesnut Appropriate use of the Glasgow Coma Scale in intubated patients: a linear regression prediction of the Glasgow verbal score from the Glasgow eye and motor scores. , 1997, The Journal of trauma.

[63]  Nanning Zheng,et al.  Generalized Correntropy for Robust Adaptive Filtering , 2015, IEEE Transactions on Signal Processing.

[64]  M. Taqqu,et al.  Stable Non-Gaussian Random Processes : Stochastic Models with Infinite Variance , 1995 .

[65]  Leah E. Mechanic,et al.  A Review of NCI's Extramural Grant Portfolio: Identifying Opportunities for Future Research in Genes and Environment in Cancer , 2013, Cancer Epidemiology, Biomarkers & Prevention.

[66]  G. Rennert,et al.  Polymorphisms in Alcohol Metabolism Genes ADH1B and ALDH2, Alcohol Consumption and Colorectal Cancer , 2013, PloS one.

[67]  Johanna W Lampe,et al.  Long-Chain Omega-3 Polyunsaturated Fatty Acid Intake and Risk of Colorectal Cancer , 2014, Nutrition and cancer.

[68]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[69]  Xiaobo Zhou,et al.  Multiscale agent-based modelling of ovarian cancer progression under the stimulation of the STAT 3 pathway , 2014, Int. J. Data Min. Bioinform..

[70]  Jin Gu,et al.  Changing patterns of colorectal cancer in China over a period of 20 years. , 2005, World journal of gastroenterology.

[71]  Chris Mattmann,et al.  Computing: A vision for data science , 2013, Nature.

[72]  Weifeng Liu,et al.  The Kernel Least-Mean-Square Algorithm , 2008, IEEE Transactions on Signal Processing.

[73]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[74]  Xiaobo Zhou,et al.  Employing graphics processing unit technology, alternating direction implicit method and domain decomposition to speed up the numerical diffusion solver for the biomedical engineering research , 2011 .

[75]  Qing Lu,et al.  Detecting genetic interactions for quantitative traits with U‐statistics , 2011, Genetic epidemiology.

[76]  Thomas G. Dietterich Machine-Learning Research Four Current Directions , 1997 .

[77]  Hideo Tanaka,et al.  Association between an 8q24 locus and the risk of colorectal cancer in Japanese , 2009, BMC Cancer.

[78]  Hermann Brenner,et al.  Colorectal cancer , 2014, The Lancet.

[79]  Afsaneh Barzi,et al.  Molecular Pathways Molecular Pathways : Estrogen Pathway inColorectal Cancer , 2013 .

[80]  Keitaro Matsuo,et al.  Dietary intake of folate and alcohol, MTHFR C677T polymorphism, and colorectal cancer risk in Korea. , 2012, The American journal of clinical nutrition.

[81]  Yijun Sun,et al.  Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[82]  Ahmedin Jemal,et al.  International Trends in Colorectal Cancer Incidence Rates , 2009, Cancer Epidemiology Biomarkers & Prevention.

[83]  Todd Holden,et al.  A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. , 2006, Journal of theoretical biology.

[84]  Simon Ferrier,et al.  Evaluating the predictive performance of habitat models developed using logistic regression , 2000 .