Integrative Hypergraph Regularization Principal Component Analysis for Sample Clustering and Co-Expression Genes Network Analysis on Multi-Omics Data

In recent years, with the diversity and variability of cancer information, the multi-omics data have been applied in various fields. Many existing models of principal component analysis can only process single data, which makes limitations on cancer research. Therefore, in this paper, a new model called integrative principal component analysis (IPCA) is proposed to achieve the unification of multi-omics data. In addition, in order to preserve the high-order manifold structure between the data, an integrative hypergraph regularization principal component analysis (IHPCA) is further proposed by applying the hypergraph regularization constraint. The effectiveness of IHPCA method is tested on four multi-omics datasets. Experimental results show that the proposed method has better performance than other representative methods on sample clustering and common expression genes (co-expression genes) network analysis.

[1]  Hao Chang,et al.  MiR-182 promotes cell proliferation by suppressing FBXW7 and FBXW11 in non-small cell lung cancer. , 2018, American journal of translational research.

[2]  Bernhard Schölkopf,et al.  Learning with Hypergraphs: Clustering, Classification, and Embedding , 2006, NIPS.

[3]  Sampsa Hautaniemi,et al.  CNAmet: an R package for integrating copy number, methylation and expression data , 2011, Bioinform..

[4]  Zhuowen Tu,et al.  Similarity network fusion for aggregating data types on a genomic scale , 2014, Nature Methods.

[5]  Yong Xu,et al.  Supervised Discriminative Sparse PCA for Com-Characteristic Gene Selection and Tumor Classification on Multiview Biological Data , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Yusuke Nakamura,et al.  Overexpression of Cohesion Establishment Factor DSCC1 through E2F in Colorectal Cancer , 2014, PloS one.

[7]  Nico Pfeifer,et al.  Integrating different data types by regularized unsupervised multiple kernel learning with application to cancer subtype discovery , 2015, Bioinform..

[8]  David B. Dunson,et al.  Bayesian consensus clustering , 2013, Bioinform..

[9]  Adam B. Olshen,et al.  Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis , 2009, Bioinform..

[10]  Rong Zhu,et al.  Co-differential Gene Selection and Clustering Based on Graph Regularized Multi-View NMF in Cancer Genomic Data , 2018, Genes.

[11]  D. Geerts,et al.  ATP13A3 and caveolin-1 as potential biomarkers for difluoromethylornithine-based therapies in pancreatic cancers. , 2016, American journal of cancer research.

[12]  W. Cong,et al.  Multiple genes identified as targets for 20q13.12–13.33 gain contributing to unfavorable clinical outcomes in patients with hepatocellular carcinoma , 2015, Hepatology International.

[13]  Derek Y. Chiang,et al.  The landscape of somatic copy-number alteration across human cancers , 2010, Nature.

[14]  Jeffrey S. Morris,et al.  Bayesian methods for expression-based integration of various types of genomics data , 2013, EURASIP J. Bioinform. Syst. Biol..

[15]  Nello Cristianini,et al.  A statistical framework for genomic data fusion , 2004, Bioinform..

[16]  J. Casal,et al.  Proteomic expression analysis of colorectal cancer by two‐dimensional differential gel electrophoresis , 2005, Proteomics.

[17]  Jane You,et al.  Low-rank matrix factorization with multiple Hypergraph regularizer , 2015, Pattern Recognit..

[18]  Sheng Huang,et al.  Improved hypergraph regularized Nonnegative Matrix Factorization with sparse representation , 2018, Pattern Recognit. Lett..

[19]  Mohd Firdaus Raih,et al.  Reconstructing gene regulatory networks from knock-out data using Gaussian Noise Model and Pearson Correlation Coefficient , 2015, Comput. Biol. Chem..

[20]  N. Yoo,et al.  Frameshift Mutations in Repeat Sequences of ANK3, HACD4, TCP10L, TP53BP1, MFN1, LCMT2, RNMT, TRMT6, METTL8 and METTL16 Genes in Colon Cancers , 2018, Pathology & Oncology Research.

[21]  Stefanos Zafeiriou,et al.  Non-Negative Matrix Factorizations for Multiplex Network Analysis , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Weifeng Liu,et al.  HpLapGCN: Hypergraph p-Laplacian graph convolutional networks , 2019, Neurocomputing.

[23]  Wenhui Wang,et al.  A probabilistic multi-omics data matching method for detecting sample errors in integrative analysis , 2019, GigaScience.

[24]  Osamah M. Al-Qershi,et al.  Enhanced block-based copy-move forgery detection using k-means clustering , 2019, Multidimens. Syst. Signal Process..

[25]  Jane You,et al.  Image clustering by hyper-graph regularized non-negative matrix factorization , 2014, Neurocomputing.

[26]  Bin Liu,et al.  Graph-dual Laplacian principal component analysis , 2019, J. Ambient Intell. Humaniz. Comput..

[27]  Hong Shen,et al.  Truth finding by reliability estimation on inconsistent entities for heterogeneous data sets , 2020, Knowl. Based Syst..

[28]  Jie Lu,et al.  Multiobjective e-commerce recommendations based on hypergraph ranking , 2019, Inf. Sci..

[29]  K. Tomczak,et al.  The Cancer Genome Atlas (TCGA): an immeasurable source of knowledge , 2015, Contemporary oncology.

[30]  S. Salghetti,et al.  Destruction of Myc by ubiquitin‐mediated proteolysis: cancer‐associated and transforming mutations stabilize Myc , 1999, The EMBO journal.

[31]  Jian Huang,et al.  Silencing of DLGAP5 by siRNA Significantly Inhibits the Proliferation and Invasion of Hepatocellular Carcinoma Cells , 2013, PloS one.

[32]  A. Maitra,et al.  The Status and Impact of Clinical Tumor Genome Sequencing. , 2019, Annual review of genomics and human genetics.

[33]  Marinka Zitnik,et al.  Orthogonal matrix factorization enables integrative analysis of multiple RNA binding proteins , 2016, Bioinform..

[34]  Zhen Cui,et al.  LncRNA-Disease Associations Prediction Using Bipartite Local Model With Nearest Profile-Based Association Inferring , 2020, IEEE Journal of Biomedical and Health Informatics.

[35]  J. Baik,et al.  ZNF313 is a novel cell cycle activator with an E3 ligase activity inhibiting cellular senescence by destabilizing p21WAF1 , 2013, Cell Death and Differentiation.

[36]  P. Laird,et al.  Discovery of multi-dimensional modules by integrative analysis of cancer genomic data , 2012, Nucleic acids research.

[37]  Yue Gao,et al.  Hypergraph-Induced Convolutional Networks for Visual Classification , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[38]  O. Süzer,et al.  Activity of the enzymes participating in purine metabolism of cancerous and noncancerous human kidney tissues. , 1997, Cancer investigation.

[39]  Jaw-Yuan Wang,et al.  Nesfatin-1/Nucleobindin-2 enhances cell migration, invasion, and epithelial-mesenchymal transition via LKB1/AMPK/TORC1/ZEB1 pathways in colon cancer , 2016, Oncotarget.

[40]  E. Bandrés,et al.  Moving forward in colorectal cancer research, what proteomics has to tell. , 2007, World journal of gastroenterology.

[41]  P. Broderick,et al.  Evaluation of NTHL1, NEIL1, NEIL2, MPG, TDG, UNG and SMUG1 genes in familial colorectal cancer predisposition , 2006, BMC Cancer.

[42]  Florian Markowetz,et al.  Patient-Specific Data Fusion Defines Prognostic Cancer Subtypes , 2011, PLoS Comput. Biol..

[43]  R. Hertzberg,et al.  High-throughput screening: new technology for the 21st century. , 2000, Current opinion in chemical biology.

[44]  K. Gautvik,et al.  POLD2 and KSP37 (FGFBP2) Correlate Strongly with Histology, Stage and Outcome in Ovarian Carcinomas , 2010, PloS one.

[45]  Xuelong Li,et al.  Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search , 2013, IEEE Transactions on Image Processing.

[46]  C. Der,et al.  Targeting the Raf-MEK-ERK mitogen-activated protein kinase cascade for the treatment of cancer , 2007, Oncogene.

[47]  Ke Liu,et al.  MALAT1 modulates the autophagy of retinoblastoma cell through miR‐124‐mediated stx17 regulation , 2018, Journal of cellular biochemistry.

[48]  Luciano Milanesi,et al.  Methods for the integration of multi-omics data: mathematical aspects , 2016, BMC Bioinformatics.

[49]  R. McPherson,et al.  The role of mitogen-activated protein (MAP) kinase in breast cancer , 2002, The Journal of Steroid Biochemistry and Molecular Biology.

[50]  M. Schnölzer,et al.  Detection of Proteome Changes in Human Colon Cancer Induced by Cell Surface Binding of Growth-Inhibitory Human Galectin-4 Using Quantitative SILAC-Based Proteomics. , 2016, Journal of proteome research.

[51]  Wei Zhang,et al.  Genome-scale analysis identifies NEK2, DLGAP5 and ECT2 as promising diagnostic and prognostic biomarkers in human lung cancer , 2017, Scientific Reports.