Integrative machine learning analysis of multiple gene expression profiles in cervical cancer

Although most of the cervical cancer cases are reported to be closely related to the Human Papillomavirus (HPV) infection, there is a need to study genes that stand up differentially in the final actualization of cervical cancers following HPV infection. In this study, we proposed an integrative machine learning approach to analyse multiple gene expression profiles in cervical cancer in order to identify a set of genetic markers that are associated with and may eventually aid in the diagnosis or prognosis of cervical cancers. The proposed integrative analysis is composed of three steps: namely, (i) gene expression analysis of individual dataset; (ii) meta-analysis of multiple datasets; and (iii) feature selection and machine learning analysis. As a result, 21 gene expressions were identified through the integrative machine learning analysis which including seven supervised and one unsupervised methods. A functional analysis with GSEA (Gene Set Enrichment Analysis) was performed on the selected 21-gene expression set and showed significant enrichment in a nine-potential gene expression signature, namely PEG3, SPON1, BTD and RPLP2 (upregulated genes) and PRDX3, COPB2, LSM3, SLC5A3 and AS1B (downregulated genes).

[1]  S. Murphy,et al.  Associations between Methylation of Paternally Expressed Gene 3 (PEG3), Cervical Intraepithelial Neoplasia and Invasive Cervical Cancer , 2013, PloS one.

[2]  David J. Nott,et al.  Meta-Analysis and Gene Set Enrichment Relative to ER Status Reveal Elevated Activity of MYC and E2F in the “Basal” Breast Cancer Subgroup , 2009, PloS one.

[3]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[4]  Lianqin Li,et al.  Peroxiredoxin 3 is a novel marker for cell proliferation in cervical cancer. , 2013, Biomedical reports.

[5]  Jean YH Yang,et al.  Bioconductor: open software development for computational biology and bioinformatics , 2004, Genome Biology.

[6]  T. Stokke,et al.  Gene expressions and copy numbers associated with metastatic phenotypes of uterine cervical cancer , 2006, BMC Genomics.

[7]  H. Horvitz,et al.  MicroRNA expression profiles classify human cancers , 2005, Nature.

[8]  Joseph O Deasy,et al.  A microRNA expression signature for cervical cancer prognosis. , 2010, Cancer research.

[9]  Igor V. Tetko,et al.  Gene selection from microarray data for cancer classification - a machine learning approach , 2005, Comput. Biol. Chem..

[10]  Dahai Liu,et al.  Integrated analysis of ischemic stroke datasets revealed sex and age difference in anti-stroke targets , 2016, PeerJ.

[11]  Luis Alfonso Ureña López,et al.  Using information gain to improve multi-modal information retrieval systems , 2008, Inf. Process. Manag..

[12]  Jong Kuk Park,et al.  Increased expression of ICAM‐3 is associated with radiation resistance in cervical cancer , 2005, International journal of cancer.

[13]  J. Zempleni,et al.  Expression of oncogenes depends on biotin in human small cell lung cancer cells NCI-H69. , 2003, International journal for vitamin and nutrition research. Internationale Zeitschrift fur Vitamin- und Ernahrungsforschung. Journal international de vitaminologie et de nutrition.

[14]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[15]  Aixia Guo,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2014 .

[16]  E. Barillot,et al.  Asf1b, the necessary Asf1 isoform for proliferation, is predictive of outcome in breast cancer , 2011, The EMBO journal.

[17]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[18]  D. Noh,et al.  Differential profiling of breast cancer plasma proteome by isotope-coded affinity tagging method reveals biotinidase as a breast cancer biomarker , 2010, BMC Cancer.

[19]  Igor Jurisica,et al.  Gene Expression Profiling in Cervical Cancer: An Exploration of Intratumor Heterogeneity , 2006, Clinical Cancer Research.

[20]  P. Abraham,et al.  An update on diagnostic value of biotinidase: From liver damage tocancer: Minireview. , 2013 .

[21]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[22]  Meta-analysis and Network Analysis of Five Ovarian Cancer Gene Expression Dataset , 2010, 2010 Third International Joint Conference on Computational Science and Optimization.

[23]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[24]  M. Newton,et al.  Molecular transitions from papillomavirus infection to cervical precancer and cancer: Role of stromal estrogen receptor signaling , 2015, Proceedings of the National Academy of Sciences.

[25]  Qinghua Qiu,et al.  microRNA-383 mediates high glucose-induced oxidative stress and apoptosis in retinal pigment epithelial cells by repressing peroxiredoxin 3. , 2017, American journal of translational research.

[26]  J. Xu,et al.  Ribosomal proteins and colorectal cancer. , 2007, Current genomics.

[27]  M. Daly,et al.  PGC-1α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes , 2003, Nature Genetics.

[28]  Christian Pilarsky,et al.  Meta-analysis of microarray data on pancreatic cancer defines a set of commonly dysregulated genes , 2005, Oncogene.

[29]  S. Narod,et al.  Fusion in the ETS gene family and prostate cancer , 2008, British Journal of Cancer.

[30]  M. Nagai,et al.  Differentially expressed genes in the prostate cancer cell line LNCaP after exposure to androgen and anti-androgen. , 2006, Cancer genetics and cytogenetics.

[31]  W. J. Brammar,et al.  A sequence previously identified as metastasis-related encodes an acidic ribosomal phosphoprotein, P2. , 1990, British Journal of Cancer.

[32]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[33]  S. Chanock,et al.  Single Nucleotide Polymorphisms in the PRDX3 and RPS19 and Risk of HPV Persistence and Cervical Precancer/Cancer , 2012, PloS one.

[34]  Jinyu Hu,et al.  Distinction immune genes of hepatitis-induced heptatocellular carcinoma , 2012, Bioinform..

[35]  Pablo Tamayo,et al.  Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[36]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[37]  Jason E. Stewart,et al.  Minimum information about a microarray experiment (MIAME)—toward standards for microarray data , 2001, Nature Genetics.

[38]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Rainer Breitling,et al.  RankProd: a bioconductor package for detecting differentially expressed genes in meta-analysis , 2006, Bioinform..

[40]  E. Wright,et al.  The sodium/glucose cotransport family SLC5 , 2004, Pflügers Archiv.

[41]  Kathleen R. Cho,et al.  Tumorigenesis and Neoplastic Progression Loss of Estrogen Receptor 1 Enhances Cervical Cancer Invasion , 2022 .

[42]  N. Park,et al.  Multiple HPV infection in cervical cancer screened by HPVDNAChip. , 2003, Cancer letters.

[43]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[44]  M. Ceccarelli,et al.  Ensemble of Gene Signatures Identifies Novel Biomarkers in Colorectal Cancer Activated through PPARγ and TNFα Signaling , 2013, PloS one.

[45]  R. Ralhan,et al.  Biotinidase is a Novel Marker for Papillary Thyroid Cancer Aggressiveness , 2012, PloS one.

[46]  K. Kinoshita,et al.  Rank of Correlation Coefficient as a Comparable Measure for Biological Significance of Gene Coexpression , 2009, DNA research : an international journal for rapid publication of reports on genes and genomes.

[47]  Chaohua Li,et al.  Discovery and validation of prognostic markers in gastric cancer by genome-wide expression profiling. , 2011, World journal of gastroenterology.

[48]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[49]  R. Sternglanz,et al.  Two New S‐Phase‐Specific Genes from Saccharomyces cerevisiae , 1997, Yeast.

[50]  M. Mansukhani,et al.  Identification of copy number gain and overexpressed genes on chromosome arm 20q by an integrative genomic approach in cervical cancer: Potential role in progression , 2008, Genes, chromosomes & cancer.

[51]  Rainer Breitling,et al.  Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments , 2004, FEBS letters.

[52]  T. Yamashita,et al.  Peg3/Pw1 Is Involved in p53-mediated Cell Death Pathway in Brain Ischemia/Hypoxia* , 2002, The Journal of Biological Chemistry.

[53]  Mark D. Johnson,et al.  The Imprinted Gene PEG3 Inhibits Wnt Signaling and Regulates Glioma Growth* , 2010, The Journal of Biological Chemistry.

[54]  B. Ames,et al.  The causes and prevention of cancer: gaining perspective. , 1997, Environmental health perspectives.

[55]  R. Jamal,et al.  Meta-analysis of gene expression in relapsed childhood B-acute lymphoblastic leukemia , 2017, BMC Cancer.

[56]  Randal S. Olson,et al.  Relief-Based Feature Selection: Introduction and Review , 2017, J. Biomed. Informatics.

[57]  Tianzi Jiang,et al.  A combinational feature selection and ensemble neural network method for classification of gene expression data , 2004, BMC Bioinformatics.

[58]  Eytan Domany,et al.  Identification of a proliferation gene cluster associated with HPV E6/E7 expression level and viral DNA load in invasive cervical carcinoma , 2005, Oncogene.

[59]  H. Hollema,et al.  Involvement of the TGF-β and β-Catenin Pathways in Pelvic Lymph Node Metastasis in Early-Stage Cervical Cancer , 2011, Clinical Cancer Research.

[60]  M. Plummer,et al.  Smoking and cervical cancer: pooled analysis of the IARC multi-centric case–control study , 2003, Cancer Causes & Control.

[61]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[62]  Yuhang Wang,et al.  Application of Relief-F feature filtering algorithm to selecting informative genes for cancer classification using microarray data , 2004, Proceedings. 2004 IEEE Computational Systems Bioinformatics Conference, 2004. CSB 2004..

[63]  Roland Eils,et al.  Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes , 2005, BMC Bioinformatics.

[64]  W. Choe,et al.  Expression of human peroxiredoxin isoforms in response to cervical carcinogenesis. , 2009, Oncology reports.

[65]  Nurul Ainin Abdul Aziz,et al.  A 19-Gene expression signature as a predictor of survival in colorectal cancer , 2016, BMC Medical Genomics.

[66]  A. Frumkin,et al.  F-Spondin Is Required for Accurate Pathfinding of Commissural Axons at the Floor Plate , 1999, Neuron.

[67]  J. Yun,et al.  Identification of a gene‐expression signature for predicting lymph node metastasis in patients with early stage cervical carcinoma , 2011, Cancer.

[68]  E. Wright,et al.  The sodium/glucose cotransport family SLC5 , 2003, Pflügers Archiv.