论文信息 - Closure , connectivity and degrees : New specifications for exponential random graph ( p * ) models for directed social networks

Closure , connectivity and degrees : New specifications for exponential random graph ( p * ) models for directed social networks

The new higher order specifications for exponential random graph models introduced by Snijders, Pattison, Robins & Handcock (2006) exhibit dramatic improvements in model fit compared with the commonly used Markov random graphs. Snijders et al briefly presented versions of these new specifications for directed graphs, in particular a directed alternating ktriangle parameter, based on closure of multiple two-paths. In this paper, we present a number of additional higher order parameters for directed graphs. Most importantly, we propose three new triadic-based parameters to represent different versions of triadic closure: cyclic effects; transitivity based on shared choices of partners; and transitivity based on shared popularity. We also introduce corresponding parameters for multiple connectivity effects. We propose some fifty graph features to be investigated in goodness of fit diagnostics for these new parameters. As empirical illustrations, we develop models for two sets of organizational network data, to show that the new parameters help with an optimal representation of the data. The first example is a trust network within a training group, and the second a “work difficulty” network within a government instrumentality. In the first example we show that our additional parameters are necessary to obtain an acceptable model for the data. The second example is novel in fitting a statistical model, and inferring structural processes, for a negative tie network. Using this second example, we show how the incorporation of additional effects – the number of sources and sinks in the network, and the correlation between the inand out-degree distributions – can improve representation of the degree distribution. The final model acceptably replicates the negative tie network in terms of: statistics related to twenty different graph configurations; the inand out-degree distribution, including their correlation; seven different graph clustering coefficients; the triad census; and the geodesic distribution. Model interpretation emphasizes the importance of some nodes receiving high numbers of negative ties. Exponential random graph models are the most effective statistical approach for modeling a single network observation. This class of models was introduced by Frank and Strauss (1986) with their Markov random graph models, which were elaborated and popularized as p* models in the 1990s (Pattison & Wasserman, 1999; Robins, Pattison & Wasserman, 1999; Wasserman & Pattison, 1996 – see Wasserman & Robins, 2005, for a review). Markov random graphs are based on the dependence assumption that two possible network edges are conditionally independent unless they share a node. This seemingly simple assumption in fact results in a highly complex parameter space, one that is problematic in modeling most observed social networks (see Handcock, 2002; Snijders, 2002; Snijders, Pattison, Robins & Handcock, 2006; Robins, Snijders, Wang, Handcock & Pattison, 2006). Using a more complex partial dependence assumption, Snijders et al (2006) introduced new specifications for exponential random graph models, intended to circumvent some of the problems of Markov random graphs (see also Hunter & Handcock, 2006). In particular, these new specifications include higher order parameterization of triangulation effects, as well as effects for degree-based processes and multiple connectivity. Inclusion of these higher order effects have resulted in much improved model performance, in terms both of obtaining convergent parameter estimates and of improving goodness of fit of the models (Goodreau, 2006; Robins et al, 2006). Based on his experience of using the new models with large network data sets, Goodreau (2006) concluded that the new specifications represented a major advance in the field of statistical network analysis. To date, work on the new models (Goodreau, 2006; Hunter, 2006; Hunter & Handcock, 2006; Robins et al, 2006; Snijders et al, 2006) has concentrated on non-directed graphs. For directed networks, Snijders et al (2006) proposed specifications that were counterparts of the parameters in the non-directed models. This paper reviews, but also generalizes, these specifications for directed graphs. Most importantly, we present three new parameters related to triadic network closure (in addition to the one parameter originally proposed by Snijders et al, 2006). We show that models with the additional closure parameters can improve goodness of fit and that model interpretation is subtly different for the different parameters. We also present counterpart parameters to represent multiple connectivity. Using these new parameters, we show by empirical example how the building of a model with the addition of various effects may improve representation not only of the patterns of closure and connectivity in the data, but may also assist with the modeling of the degree distribution and other features of the graph. The article is structured as follows. We begin by presenting the general form of the exponential random graph model and review the more familiar non-directed versions of the new specifications. We review the general diagnostic method to examine goodness of fit. After presenting the directed specifications of Snijders et al (2006), we introduce the three new additional network closure parameters, and counterparts representing multiple connectivity, and discuss interpretation. We propose a range of directed graph features for diagnostic examination of goodness of fit. We present two empirical examples. In the first, a network of positive trust ties, we show that the Snijders et al (2006) specifications for directed graphs do not lead to a stable model, but incorporation of the new parameters successfully overcomes this problem. Our other empirical illustration is the fitting a statistical model to a negative tie network, in this case a “work difficulty” network. We show that the new parameters are necessary to represent closure in this network, and that the incorporation of further effects into the model successfully replicates the degree distribution as well. We conclude with general comments about fitting these models to directed networks and discuss further work. Exponential random graph models We use standard notation and terminology (Robins et al, 2006). For each pair i and j of a set N of n actors, Yij is a network tie variable with Yij = 1 if there is a network tie from i to j, and Yij = 0 otherwise. The observed value of Yij is yij with Y the matrix of all variables and y the matrix of observed ties, the network. Y may be directed or non-directed. A configuration is a small possible subgraph for which there is a parameter in the model. In broad terms, an exponential random graph model implies that the network is built up from combinations of these small configurations. The statistical basis of the model permits inferences about which configurations are important, allowing for the other effects in the model. Configurations may be interpreted as outcomes of structural processes in the network, so that the model assists judgments about those structural processes that are sufficient to explain how the network came to be. The dependence assumption delimits the possible configurations in the model. For instance, the Markov dependence assumption (reviewed below) implies that the only configurations in the model relate to edges, stars of various sizes, and triangles (Frank & Strauss, 1986). The general form of the class of (homogeneous) exponential random graph models is then as follows: Pr(Y = y) = (1/κ) exp{ΣA ηAzA(y)} (1)

[1] David R. Hunter,et al. Curved exponential family models for social networks , 2007, Soc. Networks.

[2] P. Holland,et al. A Method for Detecting Structure in Sociometric Data , 1970, American Journal of Sociology.

[3] Stanley Wasserman,et al. Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[4] P. Pattison,et al. New Specifications for Exponential Random Graph Models , 2006 .

[5] Tom A. B. Snijders,et al. Markov Chain Monte Carlo Estimation of Exponential Random Graph Models , 2002, J. Soc. Struct..

[6] P. Pattison,et al. 9. Neighborhood-Based Models for Social Networks , 2002 .

[7] Steven M. Goodreau,et al. Advances in exponential random graph (p*) models applied to a large social network , 2007, Soc. Networks.

[8] L'Annee Sociologique , 1953 .

[9] S. Wasserman,et al. Models and Methods in Social Network Analysis: An Introduction to Random Graphs, Dependence Graphs, and p * , 2005 .

[10] S. Wasserman,et al. Logit models and logistic regressions for social networks: I. An introduction to Markov graphs andp , 1996 .

[11] S. Wasserman,et al. Logit models and logistic regressions for social networks: II. Multivariate relations. , 1999, The British journal of mathematical and statistical psychology.

[12] Emmanuel Lazega,et al. Multiplexity, generalized exchange and cooperation in organizations: a case study , 1999, Soc. Networks.

[13] D. Hunter,et al. Inference in Curved Exponential Family Models for Networks , 2006 .

[14] R. Breiger,et al. Generalized exchange in social networks : Statistics and structure , 1997 .

[15] S. Wasserman,et al. Logit models and logistic regressions for social networks: III. Valued relations , 1999 .

[16] Peng Wang,et al. Recent developments in exponential random graph (p*) models for social networks , 2007, Soc. Networks.