Incorporating network based protein complex discovery into automated model construction

We propose a method for gene expression based analysis of cancer phenotypes incorporating network biology knowledge through unsupervised construction of computational graphs. The structural construction of the computational graphs is driven by the use of topological clustering algorithms on protein-protein networks which incorporate inductive biases stemming from network biology research in protein complex discovery. This structurally constrains the hypothesis space over the possible computational graph factorisation whose parameters can then be learned through supervised or unsupervised task settings. The sparse construction of the computational graph enables the differential protein complex activity analysis whilst also interpreting the individual contributions of genes/proteins involved in each individual protein complex. In our experiments analysing a variety of cancer phenotypes, we show that the proposed methods outperform SVM, Fully-Connected MLP, and Randomly-Connected MLPs in all tasks. Our work introduces a scalable method for incorporating large interaction networks as prior knowledge to drive the construction of powerful computational models amenable to introspective study.

[1]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[2]  Andre Esteva,et al.  A guide to deep learning in healthcare , 2019, Nature Medicine.

[3]  Zohreh Shams,et al.  Variational Autoencoders for Cancer Data Integration: Design Principles and Computational Practice , 2019, bioRxiv.

[4]  Shigehiko Kanaya,et al.  Development and implementation of an algorithm for detection of protein complexes in large interaction networks , 2006, BMC Bioinformatics.

[5]  Gang Chen,et al.  Modifying the DPClus algorithm for identifying protein complexes based on new topological structures , 2008, BMC Bioinformatics.

[6]  Min Wu,et al.  A core-attachment based method to detect protein complexes in PPI networks , 2009, BMC Bioinformatics.

[7]  Jason I. Herschkowitz,et al.  Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer , 2010, Breast Cancer Research.

[8]  Mikael Gustafsson,et al.  Constructing and Analyzing a Large-Scale Gene-to-Gene Regulatory Network-Lasso-Constrained Inference and Biological Validation , 2005, IEEE ACM Trans. Comput. Biol. Bioinform..

[9]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[10]  M. Gustafsson,et al.  Constructing and analyzing a large-scale gene-to-gene regulatory network Lasso-constrained inference and biological validation , 2005, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Thomas L. Casavant,et al.  Machine learning with the TCGA-HNSC dataset: improving usability by addressing inconsistency, sparsity, and high-dimensionality , 2019, BMC Bioinformatics.

[12]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[13]  Yoshua Bengio,et al.  Towards Gene Expression Convolutions using Gene Interaction Graphs , 2018, ArXiv.

[14]  Gavin C. Cawley,et al.  Gene Selection in Cancer Classification using Sparse Logistic Regression with Bayesian Regularisation , 2006 .

[15]  Zohreh Shams,et al.  Variational Autoencoders for Cancer Data Integration: Design Principles and Computational Practice , 2019, Front. Genet..

[16]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[17]  Steven J. M. Jones,et al.  Comprehensive genomic characterization of head and neck squamous cell carcinomas , 2015, Nature.

[18]  Juan Liu,et al.  Network-Regularized Sparse Logistic Regression Models for Clinical Risk Prediction and Biomarker Discovery , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[19]  Wei Zhang,et al.  Network-based machine learning and graph theory algorithms for precision oncology , 2017, npj Precision Oncology.