Towards Gene Expression Convolutions using Gene Interaction Graphs

We study the challenges of applying deep learning to gene expression data. We find experimentally that there exists non-linear signal in the data, however is it not discovered automatically given the noise and low numbers of samples used in most research. We discuss how gene interaction graphs (same pathway, protein-protein, co-expression, or research paper text association) can be used to impose a bias on a deep model similar to the spatial bias imposed by convolutions on an image. We explore the usage of Graph Convolutional Neural Networks coupled with dropout and gene embeddings to utilize the graph information. We find this approach provides an advantage for particular tasks in a low data regime but is very dependent on the quality of the graph used. We conclude that more work should be done in this direction. We design experiments that show why existing methods fail to capture signal that is present in the data when features are added which clearly isolates the problem that needs to be addressed.

[1]  Juan Liu,et al.  Network-Regularized Sparse Logistic Regression Models for Clinical Risk Prediction and Biomarker Discovery , 2016, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[2]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[3]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[4]  Yi Li,et al.  Gene expression inference with deep learning , 2015, bioRxiv.

[5]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[6]  Gary D. Bader,et al.  The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function , 2010, Nucleic Acids Res..

[7]  Minoru Kanehisa,et al.  KEGG: new perspectives on genomes, pathways, diseases and drugs , 2016, Nucleic Acids Res..

[8]  Canglin Wu,et al.  RegNetwork: an integrated database of transcriptional and post-transcriptional regulatory networks in human and mouse , 2015, Database J. Biol. Databases Curation.

[9]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[10]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[11]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[12]  Wei Zhang,et al.  Network-based machine learning and graph theory algorithms for precision oncology , 2017, npj Precision Oncology.