Network Clustering Analysis Using Mixture Exponential-Family Random Graph Models and Its Application in Genetic Interaction Data

Motivation: Epistatic miniarrary profile (EMAP) studies have enabled the mapping of large-scale genetic interaction networks and generated large amounts of data in model organisms. It provides an incredible set of molecular tools and advanced technologies that should be efficiently understanding the relationship between the genotypes and phenotypes of individuals. However, the network information gained from EMAP cannot be fully exploited using the traditional statistical network models. Because the genetic network is always heterogeneous, for example, the network structure features for one subset of nodes are different from those of the left nodes. Exponential-family random graph models (ERGMs) are a family of statistical models, which provide a principled and flexible way to describe the structural features (e.g., the density, centrality, and assortativity) of an observed network. However, the single ERGM is not enough to capture this heterogeneity of networks. In this paper, we consider a mixture ERGM (MixtureEGRM) networks, which model a network with several communities, where each community is described by a single EGRM. Results: EM algorithm is a classical method to solve the mixture problem, however, it will be very slow when the data size is huge in the numerous applications. We adopt an efficient novel online graph clustering algorithm to classify the graph nodes and estimate the ERGM parameters for the MixtureERGM. In comparison studies, the MixtureERGM outperforms the role analysis for the network cluster in which the mixture of exponential-family random graph model is developed for many ego-network according to their roles. One genetic interaction network of yeast and two real social networks (provided as supplemental materials, which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2017.2743711) show the wide potential application of the MixtureERGM.

[1]  P. Holland,et al.  An Exponential Family of Probability Distributions for Directed Graphs , 1981 .

[2]  J. Besag Statistical Analysis of Non-Lattice Data , 1975 .

[3]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[4]  G. Celeux,et al.  A Classification EM algorithm for clustering and two stochastic versions , 1992 .

[5]  Alberto Caimo,et al.  Bayesian inference for exponential random graph models , 2010, Soc. Networks.

[6]  Pavel N Krivitsky,et al.  Computational Statistical Methods for Social Network Models , 2012, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[7]  S. Boorman,et al.  Social structure from multiple networks: I , 1976 .

[8]  Mark S. Handcock,et al.  A framework for the comparison of maximum pseudo-likelihood and maximum likelihood estimation of exponential family random graph models , 2009, Soc. Networks.

[9]  W. Marsden I and J , 2012 .

[10]  Shaojun Wang,et al.  Almost sure convergence of Titterington's recursive estimator for mixture models , 2002, Proceedings IEEE International Symposium on Information Theory,.

[11]  Mona Singh,et al.  How and when should interactome-derived clusters be used to predict functional modules and protein function? , 2009, Bioinform..

[12]  Aristidis Likas,et al.  A Reinforcement Learning Approach to Online Clustering , 1999, Neural Computation.

[13]  D. Titterington Recursive Parameter Estimation Using Incomplete Data , 1984 .

[14]  David R. Hunter,et al.  Curved exponential family models for social networks , 2007, Soc. Networks.

[15]  Mark Newman,et al.  Detecting community structure in networks , 2004 .

[16]  Thomas Brendan Murphy,et al.  Review of statistical network analysis: models, algorithms, and software , 2012, Stat. Anal. Data Min..

[17]  Tom A. B. Snijders,et al.  Markov Chain Monte Carlo Estimation of Exponential Random Graph Models , 2002, J. Soc. Struct..

[18]  Michael Salter-Townshend,et al.  Role Analysis in Networks Using Mixtures of Exponential Random Graph Models , 2015, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[19]  Sean R. Collins,et al.  A genetic interaction map of RNA-processing factors reveals links between Sem1/Dss1-containing complexes and mRNA export and splicing. , 2008, Molecular cell.

[20]  Ove Frank,et al.  http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained , 2007 .

[21]  C. Geyer,et al.  Constrained Monte Carlo Maximum Likelihood for Dependent Data , 1992 .

[22]  P. Lazarsfeld,et al.  Friendship as Social process: a substantive and methodological analysis , 1964 .

[23]  Gérard Govaert,et al.  Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[24]  D. Hunter,et al.  Goodness of Fit of Social Network Models , 2008 .

[25]  Sean R. Collins,et al.  Exploration of the Function and Organization of the Yeast Early Secretory Pathway through an Epistatic Miniarray Profile , 2005, Cell.

[26]  John Flood The Collegial Phenomenon: The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership by Emmanuel Lazega , 2005 .

[27]  Sean R. Collins,et al.  Functional Organization of the S. cerevisiae Phosphorylation Network , 2009, Cell.

[28]  Garry Robins,et al.  An introduction to exponential random graph (p*) models for social networks , 2007, Soc. Networks.

[29]  David Eppstein,et al.  Selected Open Problems in Graph Drawing , 2003, Graph Drawing.

[30]  Grant W. Brown,et al.  Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map , 2007, Nature.