Learning the Structural Vocabulary of a Network

Networks have become instrumental in deciphering how information is processed and transferred within systems in almost every scientific field today. Nearly all network analyses, however, have relied on humans to devise structural features of networks believed to be most discriminative for an application. We present a framework for comparing and classifying networks without human-crafted features using deep learning. After training, autoencoders contain hidden units that encode a robust structural vocabulary for succinctly describing graphs. We use this feature vocabulary to tackle several network mining problems and find improved predictive performance versus many popular features used today. These problems include uncovering growth mechanisms driving the evolution of networks, predicting protein network fragility, and identifying environmental niches for metabolic networks. Deep learning offers a principled approach for mining complex networks and tackling graph-theoretic problems.

[1]  C. Faloutsos,et al.  Topological properties of robust biological and computational networks , 2014, Journal of The Royal Society Interface.

[2]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[3]  Nisheeth Shrivastava,et al.  Graph summarization with bounded error , 2008, SIGMOD Conference.

[4]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[5]  Casey S. Greene,et al.  Unsupervised Feature Construction and Knowledge Extraction from Genome-Wide Assays of Breast Cancer with Denoising Autoencoders , 2014, Pacific Symposium on Biocomputing.

[6]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[7]  Christos Faloutsos,et al.  MassExodus: modeling evolving networks in harsh environments , 2014, Data Mining and Knowledge Discovery.

[8]  Hassan Foroosh,et al.  Sparse Convolutional Neural Networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Ilias Tagkopoulos,et al.  Microbial Forensics: Predicting Phenotypic Characteristics and Environmental Conditions from Large-Scale Gene Expression Profiles , 2015, PLoS Comput. Biol..

[10]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[11]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[12]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[13]  Robert Patro,et al.  Global network alignment using multiscale spectral signatures , 2012, Bioinform..

[14]  Hui Li,et al.  A Deep Learning Approach to Link Prediction in Dynamic Networks , 2014, SDM.

[15]  Jon M. Kleinberg,et al.  Navigation in a small world , 2000, Nature.

[16]  An-Ping Zeng,et al.  Reconstruction of metabolic networks from genome data and analysis of their global structure for various organisms , 2003, Bioinform..

[17]  Melissa J. Morine,et al.  Trade-Offs Between Efficiency and Robustness in Bacterial Metabolic Networks Are Associated with Niche Breadth , 2009, Journal of Molecular Evolution.

[18]  Hiroaki Kitano,et al.  Biological robustness , 2008, Nature Reviews Genetics.

[19]  Ting Wang,et al.  An improved map of conserved regulatory sites for Saccharomyces cerevisiae , 2006, BMC Bioinformatics.

[20]  Harry Eugene Stanley,et al.  Catastrophic cascade of failures in interdependent networks , 2009, Nature.

[21]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[22]  A. E. Hirsh,et al.  Evolutionary Rate in the Protein Interaction Network , 2002, Science.

[23]  R. Sharan,et al.  Network-based prediction of protein function , 2007, Molecular systems biology.

[24]  Robert Patro,et al.  The missing models: a data-driven approach for learning how networks grow , 2012, KDD.

[25]  Massimo Marchiori,et al.  Error and attacktolerance of complex network s , 2004 .

[26]  Enhong Chen,et al.  Learning Deep Representations for Graph Clustering , 2014, AAAI.

[27]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[28]  Leonidas J. Guibas,et al.  Wavelets on Graphs via Deep Learning , 2013, NIPS.

[29]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  S. Shen-Orr,et al.  Network motifs: simple building blocks of complex networks. , 2002, Science.

[31]  S. Shen-Orr,et al.  Networks Network Motifs : Simple Building Blocks of Complex , 2002 .

[32]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[33]  E. Ziv,et al.  Inferring network mechanisms: the Drosophila melanogaster protein interaction network. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[35]  Micah Adler,et al.  Towards compressing Web graphs , 2001, Proceedings DCC 2001. Data Compression Conference.

[36]  Hanghang Tong,et al.  Make It or Break It: Manipulating Robustness in Large Networks , 2014, SDM.

[37]  Mark Newman,et al.  Networks: An Introduction , 2010 .

[38]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[39]  Natasa Przulj,et al.  Biological network comparison using graphlet degree distribution , 2007, Bioinform..

[40]  Sebastian Wernicke,et al.  FANMOD: a tool for fast network motif detection , 2006, Bioinform..

[41]  A. Vespignani,et al.  Modeling of Protein Interaction Networks , 2001, Complexus.

[42]  Carl Kingsford,et al.  Network Archaeology: Uncovering Ancient Networks from Present-Day Interactions , 2010, PLoS Comput. Biol..

[43]  Ronald W. Davis,et al.  Functional profiling of the Saccharomyces cerevisiae genome , 2002, Nature.

[44]  David Moore,et al.  Internet quarantine: requirements for containing self-propagating code , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[45]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[46]  Pierre Baldi,et al.  Complex-Valued Autoencoders , 2011, Neural Networks.

[47]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[48]  O. Troyanskaya,et al.  Predicting effects of noncoding variants with deep learning–based sequence model , 2015, Nature Methods.

[49]  Kara Dolinski,et al.  The BioGRID interaction database: 2015 update , 2014, Nucleic Acids Res..

[50]  Quoc V. Le,et al.  Measuring Invariances in Deep Networks , 2009, NIPS.

[51]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[52]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[53]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[54]  Anat Kreimer,et al.  The evolution of modularity in bacterial metabolic networks , 2008, Proceedings of the National Academy of Sciences.

[55]  Albert-László Barabási,et al.  Error and attack tolerance of complex networks , 2000, Nature.

[56]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[57]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[58]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[59]  U. Alon Biological Networks: The Tinkerer as an Engineer , 2003, Science.

[60]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[61]  Hans J. Herrmann,et al.  Mitigation of malicious attacks on networks , 2011, Proceedings of the National Academy of Sciences.

[62]  R. May,et al.  Systemic risk in banking ecosystems , 2011, Nature.

[63]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[64]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[65]  I. Drigă Systemic Risk in Banking , 2007 .

[66]  Yoshua Bengio,et al.  Global training of document processing systems using graph transformer networks , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[67]  Dianne P. O'Leary,et al.  Why Do Hubs in the Yeast Protein Interaction Network Tend To Be Essential: Reexamining the Connection between the Network Topology and Essentiality , 2008, PLoS Comput. Biol..

[68]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[69]  Takashi Makino,et al.  Differential evolutionary rates of duplicated genes in protein interaction network. , 2006, Gene.

[70]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[71]  C. Greene,et al.  ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions , 2016, mSystems.