Integrating Multiple Data Sources for Combinatorial Marker Discovery: A Study in Tumorigenesis

Identification of combinatorial markers from multiple data sources is a challenging task in bioinformatics. Here, we propose a novel computational framework for identifying significant combinatorial markers (<inline-formula> <tex-math notation="LaTeX">$SCM$</tex-math><alternatives><inline-graphic xlink:href="bandyopadhyay-ieq1-2636207.gif"/> </alternatives></inline-formula>s) using both gene expression and methylation data. The gene expression and methylation data are integrated into a single continuous data as well as a (post-discretized) boolean data based on their intrinsic (i.e., inverse) relationship. A novel combined score of methylation and expression data (viz., <inline-formula><tex-math notation="LaTeX">$CoMEx$</tex-math><alternatives> <inline-graphic xlink:href="bandyopadhyay-ieq2-2636207.gif"/></alternatives></inline-formula>) is introduced which is computed on the integrated continuous data for identifying initial non-redundant set of genes. Thereafter, (maximal) frequent closed homogeneous genesets are identified using a well-known biclustering algorithm applied on the integrated boolean data of the determined non-redundant set of genes. A novel sample-based weighted support ( <inline-formula><tex-math notation="LaTeX">$WS$</tex-math><alternatives> <inline-graphic xlink:href="bandyopadhyay-ieq3-2636207.gif"/></alternatives></inline-formula>) is then proposed that is consecutively calculated on the integrated boolean data of the determined non-redundant set of genes in order to identify the non-redundant significant genesets. The top few resulting genesets are identified as potential <inline-formula><tex-math notation="LaTeX">$SCM$</tex-math><alternatives> <inline-graphic xlink:href="bandyopadhyay-ieq4-2636207.gif"/></alternatives></inline-formula>s. Since our proposed method generates a smaller number of significant non-redundant genesets than those by other popular methods, the method is much faster than the others. Application of the proposed technique on an expression and a methylation data for Uterine tumor or Prostate Carcinoma produces a set of significant combination of markers. We expect that such a combination of markers will produce lower false positives than individual markers.

[1]  Eliana Abdelhay,et al.  SPARC-like1 mRNA is overexpressed in human uterine leiomyoma. , 2008, Molecular medicine reports.

[2]  Pekka Manninen,et al.  Identification of genetic markers with synergistic survival effect in cancer , 2013, BMC Systems Biology.

[3]  Guanqing Ou,et al.  Tissue mechanics modulate microRNA-dependent PTEN expression to regulate malignant progression , 2014, Nature Medicine.

[4]  L. Coignet,et al.  NOL7 is a nucleolar candidate tumor suppressor gene in cervical cancer that modulates the angiogenic phenotype , 2006, Oncogene.

[5]  David E. Housman,et al.  Systematic Identification of Combinatorial Drivers and Targets in Cancer Cell Lines , 2013, PloS one.

[6]  K. Jennbacken,et al.  Altered expression of genes regulating angiogenesis in experimental androgen‐independent prostate cancer , 2008, The Prostate.

[7]  Ujjwal Maulik,et al.  RANWAR: Rank-Based Weighted Association Rule Mining From Gene Expression and Methylation Data , 2015, IEEE Transactions on NanoBioscience.

[8]  Kazumitsu Ueda,et al.  ABCA7, a molecule with unknown function , 2006, FEBS letters.

[9]  B. L. Welch THE SIGNIFICANCE OF THE DIFFERENCE BETWEEN TWO MEANS WHEN THE POPULATION VARIANCES ARE UNEQUAL , 1938 .

[10]  K. Kuno,et al.  ADAMTS1 alters blood vessel morphology and TSP1 levels in LNCaP and LNCaP-19 prostate tumors , 2010, BMC Cancer.

[11]  Hsuan-Cheng Huang,et al.  Methylomic Analysis Identifies Frequent DNA Methylation of Zinc Finger Protein 582 (ZNF582) in Cervical Neoplasms , 2012, PloS one.

[12]  Armando Reyes-Palomares,et al.  What is known on angiogenesis-related rare diseases? A systematic review of literature , 2012, Journal of cellular and molecular medicine.

[13]  Sandhya Mehrotra,et al.  Combinatorial Control of Gene Expression , 2013, BioMed research international.

[14]  M. Xiong,et al.  Biomarker Identification by Feature Wrappers , 2022 .

[15]  Ada Wai-Chee Fu,et al.  Mining association rules with weighted items , 1998, Proceedings. IDEAS'98. International Database Engineering and Applications Symposium (Cat. No.98EX156).

[16]  Marko Tarle,et al.  Molecular markers in prostate cancer bone metastases , 2004 .

[17]  Ujjwal Maulik,et al.  MiRNA-TF-gene network analysis through ranking of biomolecules for multi-informative uterine leiomyoma dataset , 2015, J. Biomed. Informatics.

[18]  Yong Lin,et al.  Activator protein-1 transcription factors are associated with progression and recurrence of prostate cancer. , 2008, Cancer research.

[19]  Ritsert C. Jansen,et al.  Genome-wide methylation profiling identifies hypermethylated biomarkers in high-grade cervical intraepithelial neoplasia , 2012, Epigenetics.

[20]  Das Amrita,et al.  Mining Association Rules between Sets of Items in Large Databases , 2013 .

[21]  Delila Gasi,et al.  Expression and Function of ETS Genes in Prostate Cancer , 2013 .

[22]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[23]  Ping Li,et al.  Identification of effective combinatorial markers for quality standardization of herbal medicines. , 2014, Journal of chromatography. A.

[24]  Guohong Liu,et al.  DNA methylation profiles in cancer diagnosis and therapeutics , 2018, Clinical and Experimental Medicine.

[25]  P. Nelson,et al.  Molecular characterization of prostatic small‐cell neuroendocrine carcinoma , 2003, The Prostate.

[26]  J. Wren,et al.  Elevated AKR1C3 expression promotes prostate cancer cell survival and prostate cell-mediated endothelial cell tube formation: implications for prostate cancer progressioan , 2010, BMC Cancer.

[27]  Hossein Mozdarani,et al.  Radiosensitivity and repair kinetics of gamma-irradiated leukocytes from sporadic prostate cancer patients and healthy individuals assessed by alkaline comet assay. , 2010, Iranian biomedical journal.

[28]  Anirban Mukhopadhyay,et al.  A Survey and Comparative Study of Statistical Tests for Identifying Differential Expression from Microarray Data , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[29]  Nikolina Radulovich,et al.  SOX15 and other SOX family members are important mediators of tumorigenesis in multiple cancer types , 2014, Oncoscience.

[30]  M. Loda,et al.  EZH2 Oncogenic Activity in Castration-Resistant Prostate Cancer Cells Is Polycomb-Independent , 2012, Science.

[31]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[32]  M. Zaitseva,et al.  In vitro culture significantly alters gene expression profiles and reduces differences between myometrial and fibroid smooth muscle cells. , 2006, Molecular human reproduction.

[33]  M. Mikuła,et al.  DNA methylation status is more reliable than gene expression at detecting cancer in prostate biopsy , 2014, British Journal of Cancer.

[34]  Mohammed J. Zaki Scalable Algorithms for Association Mining , 2000, IEEE Trans. Knowl. Data Eng..

[35]  Tae-Min Kim,et al.  Expression profiling of uterine leiomyomata cytogenetic subgroups reveals distinct signatures in matched myometrium: transcriptional profilingof the t(12;14) and evidence in support of predisposing genetic heterogeneity. , 2012, Human molecular genetics.

[36]  Somasekar Seshagiri,et al.  Comparative oncogenomics identifies PSMB4 and SHMT2 as potential cancer driver genes. , 2014, Cancer research.

[37]  Takuma Hayashi Sarcomagenesis in Psmb9-deficient mice; involvement of defective IRF1 activation. , 2016 .

[38]  M. Lingen,et al.  The RB tumor suppressor positively regulates transcription of the anti-angiogenic protein NOL7. , 2012, Neoplasia.

[39]  M. Sulaiman Khan,et al.  Weighted Association Rule Mining from Binary and Fuzzy Data , 2008, ICDM.

[40]  Thomas Lengauer,et al.  Factor interaction analysis for chromosome 8 and DNA methylation alterations highlights innate immune response suppression and cytoskeletal changes in prostate cancer , 2007, Molecular Cancer.

[41]  R. Trivedi,et al.  Expression profiling of G2/M phase regulatory proteins in normal, premalignant and malignant uterine cervix and their correlation with survival of patients. , 2010, Journal of cancer research and therapeutics.

[42]  Walter L. Ruzzo,et al.  Improved Gene Selection for Classification of Microarrays , 2002, Pacific Symposium on Biocomputing.

[43]  Francesca Cordero,et al.  An integrated approach of immunogenomics and bioinformatics to identify new Tumor Associated Antigens (TAA) for mammary cancer immunological prevention , 2005, BMC Bioinformatics.

[44]  Nicolas Pasquier,et al.  Discovering Frequent Closed Itemsets for Association Rules , 1999, ICDT.

[45]  Jaideep Srivastava,et al.  Selecting the right objective measure for association analysis , 2004, Inf. Syst..

[46]  Jonathan M. Garibaldi,et al.  Using Rule-Based Machine Learning for Candidate Disease Gene Prioritization and Sample Classification of Cancer Gene Expression Data , 2012, PloS one.

[47]  Alexander Langerman,et al.  Characterization of NOL7 Gene Point Mutations, Promoter Methylation, and Protein Expression in Cervical Cancer , 2012, International journal of gynecological pathology : official journal of the International Society of Gynecological Pathologists.

[48]  Lothar Thiele,et al.  A systematic comparison and evaluation of biclustering methods for gene expression data , 2006, Bioinform..

[49]  Jie Li,et al.  A new framework for identifying differentially expressed genes , 2007, Pattern Recognit..

[50]  B. Leiby,et al.  Stat5 promotes metastatic behavior of human prostate cancer cells in vitro and in vivo. , 2010, Endocrine-related cancer.

[51]  Anirban Mukhopadhyay,et al.  Identifying Non-Redundant Gene Markers from Microarray Data: A Multiobjective Variable Length PSO-Based Approach , 2014, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[52]  上田晃久,et al.  Macrophage inhibitory cytokine‐1(MIC‐1)と糖尿病性腎症の関連について , 2016 .

[53]  R. Redline,et al.  Expression of AbdB-type homeobox genes in human tumors. , 1994, Laboratory investigation; a journal of technical methods and pathology.

[54]  Matthias Wilmanns,et al.  Combinatorial control of gene expression , 2004, Nature Structural &Molecular Biology.

[55]  Gordon K Smyth,et al.  Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2004, Statistical applications in genetics and molecular biology.

[56]  Baolin Wu,et al.  Differential gene expression detection and sample classification using penalized linear regression models , 2006, Bioinform..

[57]  Fionn Murtagh,et al.  Weighted Association Rule Mining using weighted support and significance framework , 2003, KDD '03.

[58]  Ujjwal Maulik,et al.  IDPT: Insights into potential intrinsically disordered proteins through transcriptomic analysis of genes for prostate carcinoma epigenetic data. , 2016, Gene.

[59]  Jesús S. Aguilar-Ruiz,et al.  Incremental wrapper-based gene selection from microarray data for cancer classification , 2006, Pattern Recognit..

[60]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[61]  Viv Bewick,et al.  Statistics review 9: One-way analysis of variance , 2004, Critical care.

[62]  M.-H. Lee,et al.  TIAF1 self-aggregation in peritumor capsule formation, spontaneous activation of SMAD-responsive promoter in p53-deficient environment, and cell death , 2012, Cell Death and Disease.

[63]  M F Pichon,et al.  Serum cholecystokinin and neurotensin during follow-up of pancreas, prostate and medullary thyroid tumors. , 1999, Anticancer research.

[64]  J. Wang-Rodriguez,et al.  Expression signatures that correlated with Gleason score and relapse in prostate cancer. , 2007, Genomics.

[65]  Giovanni Romeo,et al.  The netrin-1 receptors UNC5H are putative tumor suppressors controlling cell death commitment , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[66]  Daniel N Cox,et al.  Genomic phenotype of non-cultured pulmonary fibroblasts in idiopathic pulmonary fibrosis. , 2010, Genomics.

[67]  Ujjwal Maulik,et al.  Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining , 2015, PloS one.

[68]  Margaret Claire Emblom-Callahan Genomic Phenotype of Pulmonary Fibroblasts in Idiopathic Pulmonary Fibrosis , 2010 .

[69]  Meng Wang,et al.  Identifying New Candidate Genes and Chemicals Related to Prostate Cancer Using a Hybrid Network and Shortest Path Approach , 2015, Comput. Math. Methods Medicine.

[70]  John Chad Brenner Therapeutic Targeting of ETS Rearranged Cancers. , 2012 .

[71]  L. Aaltonen,et al.  7q deletion mapping and expression profiling in uterine fibroids , 2005, Oncogene.

[72]  Serenella M. Pupa,et al.  FBLN1 (fibulin 1) , 2011 .

[73]  Jian-Jun Wei,et al.  Genome-Wide DNA Methylation Indicates Silencing of Tumor Suppressor Genes in Uterine Leiomyoma , 2012, PloS one.

[74]  Ujjwal Maulik,et al.  Integrated analysis of gene expression and genome-wide DNA methylation for tumor prediction: An association rule mining-based approach , 2013, 2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB).