Serum levels of chemical elements in esophageal squamous cell carcinoma in Anyang, China: a case-control study based on machine learning methods

Objectives Esophageal squamous cell carcinoma (ESCC) is the predominant form of esophageal carcinoma with extremely aggressive nature and low survival rate. The risk factors for ESCC in the high-incidence areas of China remain unclear. We used machine learning methods to investigate whether there was an association between the alterations of serum levels of certain chemical elements and ESCC. Settings Primary healthcare unit in Anyang city, Henan Province of China. Participants 100 patients with ESCC and 100 healthy controls matched for age, sex and region were included. Primary and secondary outcome measures Primary outcome was the classification accuracy. Secondary outcome was the p Value of the t-test or rank-sum test. Methods Both traditional statistical methods of t-test and rank-sum test and fashionable machine learning approaches were employed. Results Random Forest achieves the best accuracy of 98.38% on the original feature vectors (without dimensionality reduction), and support vector machine outperforms other classifiers by yielding accuracy of 96.56% on embedding spaces (with dimensionality reduction). All six classifiers can achieve accuracies more than 90% based on the single most important element Sr. The other two elements with distinctive difference are S and P, providing accuracies around 80%. More than half of chemical elements were found to be significantly different between patients with ESCC and the controls. Conclusions These results suggest clear differences between patients with ESCC and controls, implying some potential promising applications in diagnosis, prognosis, pharmacy and nutrition of ESCC. However, the results should be interpreted with caution due to the retrospective design nature, limited sample size and the lack of several potential confounding factors (including obesity, nutritional status, and fruit and vegetable consumption and potential regional carcinogen contacts).

[1]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[2]  G. M. Gulzar,et al.  Association Between Copper Excess, Zinc Deficiency, and TP53 Mutations in Esophageal Squamous Cell Carcinoma From Kashmir Valley, India—A High Risk Area , 2008, Nutrition and cancer.

[3]  C. Abnet,et al.  Prospective study of serum selenium concentrations and esophageal and gastric cardia cancer, heart disease, stroke, and total death. , 2004, The American journal of clinical nutrition.

[4]  Zhonghu He,et al.  The Anyang Esophageal Cancer Cohort Study: Study Design, Implementation of Fieldwork, and Use of Computer-Aided Survey System , 2012, PloS one.

[5]  E. Ho Zinc deficiency, DNA damage and cancer risk. , 2004, The Journal of nutritional biochemistry.

[6]  P. Knekt,et al.  Serum vitamin E, serum selenium and the risk of gastrointestinal cancer , 1988, International journal of cancer.

[7]  Thibault Helleputte,et al.  Robust biomarker identification for cancer diagnosis with ensemble feature selection methods , 2010, Bioinform..

[8]  Ahmedin Jemal,et al.  Global Cancer Incidence and Mortality Rates and Trends—An Update , 2015, Cancer Epidemiology, Biomarkers & Prevention.

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[11]  D. Ahlquist,et al.  Molecular Detection of Gastrointestinal Neoplasia: Innovations in Early Detection and Screening. , 2016, Gastroenterology clinics of North America.

[12]  Aquil Ahmad,et al.  Studies on Association Between Copper Excess, Zinc Deficiency and TP53 Mutations in Esophageal Squamous Cell Carcinoma From Kashmir Valley, India-A High Risk Area. , 2007, International journal of health sciences.

[13]  M. Hatt,et al.  Intratumor Heterogeneity Characterized by Textural Features on Baseline 18F-FDG PET Images Predicts Response to Concomitant Radiochemotherapy in Esophageal Cancer , 2011, The Journal of Nuclear Medicine.

[14]  R. Mumper,et al.  Elevated copper and oxidative stress in cancer cells as a target for cancer treatment. , 2009, Cancer treatment reviews.

[15]  H. Cunzhi,et al.  Serum and tissue levels of six trace elements and copper/zinc ratio in patients with cervical cancer and uterine myoma , 2003, Biological Trace Element Research.

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[18]  Xiaofei He,et al.  Locality Preserving Projections , 2003, NIPS.

[19]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[20]  Tiebing Liu,et al.  Comparative Study on Serum Levels of 10 Trace Elements in Schizophrenia , 2015, PloS one.

[21]  A. Jemal,et al.  Global cancer statistics, 2012 , 2015, CA: a cancer journal for clinicians.

[22]  Ma El,et al.  Ion-exchange chromatography in simultaneous determination of serum copper and zinc levels in patients with cancer of digestive tract. , 1993 .

[23]  A. Jemal,et al.  Global cancer statistics , 2011, CA: a cancer journal for clinicians.

[24]  E. B. Andersen,et al.  Modern factor analysis , 1961 .

[25]  D. Milde,et al.  Serum levels of selenium, manganese, copper, and iron in colorectal cancer patients , 2001, Biological Trace Element Research.

[26]  Ethem Alpaydin Introduction to machine learning, 2rd ed , 2014 .

[27]  J. Ioannidis,et al.  Strengthening the reporting of genetic association studies (STREGA): an extension of the STROBE statement , 2009, European Journal of Epidemiology.

[28]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[29]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[30]  R. Stephens,et al.  Unique microRNA molecular profiles in lung cancer diagnosis and prognosis. , 2006, Cancer cell.

[31]  Alex M. Andrew,et al.  Boosting: Foundations and Algorithms , 2012 .

[32]  Patrick Neven,et al.  Vitamin D status at breast cancer diagnosis: correlation with tumor characteristics, disease outcome, and genetic determinants of vitamin D insufficiency. , 2012, Carcinogenesis.

[33]  L. Poston,et al.  Selenium in reproductive health. , 2012, American journal of obstetrics and gynecology.

[34]  John T. Wei,et al.  Combining urinary detection of TMPRSS2:ERG and PCA3 with serum PSA to predict diagnosis of prostate cancer. , 2013, Urologic oncology.

[35]  S. Franceschi,et al.  EPIDEMIOLOGY OF ESOPHAGEAL CANCER , 2013 .

[36]  Eric O. Postma,et al.  Dimensionality Reduction: A Comparative Review , 2008 .

[37]  W. Budach,et al.  Esophageal cancer: Clinical Practice Guidelines for diagnosis, treatment and follow-up. , 2010, Annals of oncology : official journal of the European Society for Medical Oncology.

[38]  Ying Wang,et al.  Correlations of Trace Element Levels in the Diet, Blood, Urine, and Feces in the Chinese Male , 2012, Biological trace element research.