How Correlated Are Network Centrality Measures?

Calculating centrality has been a major focus of social network analysis research for some time (Freeman, 1979). Textbooks and reference volumes on social networks include a chapter on centrality calculations and concepts (e.g., Degenne & Forse, 1999; Scott, 2000; Wasserman & Faust, 1994). Currently, at least eight centrality measures have been proposed and made available in UCINET 6 (Borgatti, et al., 2005). These measures are: degree, betweenness, closeness, eigenvector, power, information, flow, and reach. Perhaps the most frequently used centrality measures are degree, closeness, betweenness, and eigenvector. The first three were proposed by Freeman (1979) and eigenvector was proposed by Bonacich (1972). Centrality is important because it indicates who occupies critical positions in the network. Central positions have often been equated with opinion leadership or popularity, both of which have been shown to be associated with adoption behaviors (Becker, 1970; Rogers, 2003; Valente, 1995; Valente & Davis, 1999). Typically, investigators use only the degree measure of centrality (simply the number of links a person has), as it is the easiest to explain to non-network savvy audiences and its association with behavior is intuitive. An often asked, yet rarely answered question has been: Are these centrality measures correlated? All centrality measures are derived from the adjacency matrix and so constitute different mathematical computations on the same underlying data. If the measures are highly correlated, then the development of multiple measures may be somewhat redundant and we can expect the different measures to behave similarly in statistical analyses. On the other hand, if the measures are not highly correlated, they indicate distinctive measures likely to be associated with different outcomes. Previous studies have examined correlations among centrality measures. One study examined correlations between degree, closeness, betweenness, and flow, and also examined these relationships under conditions of random error, systematic error, and incomplete data (Bolland, 1988). Overall degree, closeness, and continuing flow centrality were strongly intercorrelated, while betweenness remained relatively uncorrelated with the other three measures (Bolland, 1988). In a network study of individuals connected through participation in HIV risk behaviors, Rothenberg and colleagues (1995) examined relationships among eight centrality measures: three forms of information centrality, three distance measures (i.e., eccentricity, mean, and median), and degree and betweenness centrality. Their analyses showed these eight centrality measures to be highly correlated with a few notable distinctions. While the three distance measures were highly interrelated, they were also strongly correlated with the three information measures, although less so with degree and betweenness. The latter two measures, degree and betweenness, were highly correlated, although less so with information measures. The information measures were also highly correlated. In another study, Valente and Forman (1998) examined correlations between measures of integration and radiality and other centrality measures and personal network density. Using data from the Sampson Monastery dataset (1969) and the Medical Innovations study (Coleman et al. 1966; Burt 1987), they found that integration was most highly and positively correlated with in-degree centrality, positively correlated with closeness, betweenness, and flow, and negatively correlated with density (Valente & Foreman, 1998). In comparison, radiality was significantly and negatively correlated with out-degree but only in the Medical Innovations dataset. Lastly, Faust (1997) examined correlations among centrality measures using a subset of the data from Galaskiewicz’s study (1985) regarding relationships between CEOs, clubs and boards. Faust (1997) found correlations ranging from .89 to .99 among centrality measures including degree, closeness, betweenness, the centrality of an event, and flow betweenness for the identification of central clubs. In this manuscript, we empirically investigate the correlation among four centrality measures, which we felt were those most commonly used by network analysts: degree, betweenness, closeness, and eigenvector. Degree and closeness are directional measures, so we calculate both in-degree and out-degree, and in-closeness and out-closeness. Closeness was calculated by inverting the distance matrix and taking the row average for closeness-out and the column average for closeness-in (Freeman, 1979). Nodes that were disconnected were given a distance of N-1 so that distances could be calculated. We also calculated closeness based on reversed distances (so called integration/radiality) but found these measures to be largely redundant with closeness based on inverting distances (Valente & Foreman, 1998). Betweenness indicates how frequently a node lies along geodesic pathways of other nodes in the network, and therefore is an inherently asymmetric measure. Eigenvector can only be calculated on a symmetric network and so matrices have to be symmetrized before eigenvector centrality is calculated. To compare eigenvector centrality to the other three measures thus requires that degree, closeness, and betweenness be calculated on symmetric data as well. Degree, betweenness, eigenvector and closeness are all measure of an actor’s prominence in a network (Wasserman & Faust, 1994). While considerable conceptual overlap exists between these constructs, they also may be conceptually distinct. For example, a node in the center of a star or wheel is the most central node in the network by all centrality measures (Freeman, 1979). In other network configurations, however, nodes with high degree centrality are not necessarily the most strategically located. One way to characterize such distinctions among these constructs is in terms of how actors who occupy positions high on each type of centrality transmit influence to other actors in a network. We might expect that the pathway of influence transmitted from nodes high in degree and closeness centrality will be similar. Both can quickly transmit information and influence through direct or short paths to others and interact with many others directly. Closeness measures are based on the ideas of efficiency and independence (Freidkin, 1991). As a result of being situated close to others in the network, actors high on closeness measures are able to efficiently transmit information and have independence in the sense that they do not need to seek information from other more peripheral actors. Betweenness centrality measures the extent to which an actor lies between other actors on their geodesics. Actors high on betweenness centrality, therefore, have the potential to influence others near them in a network (Friedkin, 1991), seemingly through both direct and indirect pathways. A node with high betweenness centrality can potentially influence the spread of information through the network, by facilitating, hindering, or even altering the communication between others (Freeman, 1979; Newman, 2003). Similarly, those high on eigenvector centrality are linked to well-connected actors and so may influence many others in the network either directly or indirectly through their connections. We expect that measures of degree and closeness centrality will be more highly correlated with each other than with other measures, because they are both based on direct ties. We are unsure, however, how the other centrality measures will correlate with one another. Conceptually, each centrality measure represents a different process by which key players might influence the flow of information through a social network. In this study we examine the correlation between the symmetrized and directed versions of four centrality measures; symmetrized degree, in-degree, and out-degree, symmetrized betweenness, and betweenness, symmetrized closeness, closeness-in, and closeness-out, and eigenvector (symmetric only). We calculated these nine centrality measures for 58 existing social networks (from seven separate studies) analyzed previously by Costenbader and Valente (2003). We correlated the 9 measures for each network and then calculated the average correlation, standard deviation, and range across centrality measures. We also calculated the overall correlation and compared it by study to assess the degree of variation in average correlation between studies. Lastly, we explore the associations between four different sociometric network properties (i.e., density, reciprocity, centralization and number of components) and the centrality correlations. This last analysis seeks to determine whether centrality measures are more highly correlated in dense or sparse networks, in reciprocal or non-reciprocal networks, in centralized or decentralized networks, and in networks with few or many components. Density is the number of ties in the network divided by the total possible number of ties (N*(N−1)). Reciprocity was measured as the percent of possible ties that are symmetric. Degree centralization was measured using Freeman’s (1979) formula. The number of components in the network was determined by symmetrizing the network and calculating components.

[1]  Stephen P. Borgatti,et al.  Centrality and network flow , 2005, Soc. Networks.

[2]  Thomas W. Valente,et al.  The stability of centrality measures when networks are sampled , 2003, Soc. Networks.

[3]  E. Rogers,et al.  Diffusion of innovations , 1964, Encyclopedia of Sport Management.

[4]  Mark E. J. Newman,et al.  Ego-centered networks and the ripple effect , 2001, Soc. Networks.

[5]  T. Valente,et al.  Accelerating the Diffusion of Innovations Using Opinion Leaders , 1999 .

[6]  Alain Degenne,et al.  Introducing Social Networks , 1999 .

[7]  T. Valente,et al.  Integration and radiality: Measuring the extent of an individual's connectedness and reachability in a network , 1998 .

[8]  E. Lazega,et al.  Position in formal structure, personal characteristics and choices of advisors in a law firm: A logistic regression model for dyadic network data , 1997 .

[9]  T. Valente,et al.  Social network associations with contraceptive use among Cameroonian women in voluntary associations. , 1997, Social science & medicine.

[10]  Katherine Faust Centrality in affiliation networks , 1997 .

[11]  T. Valente,et al.  Network models of the diffusion of innovations , 1995, Comput. Math. Organ. Theory.

[12]  Richard Rothenberg,et al.  Choosing a centrality measure: Epidemiologic correlates in the Colorado Springs study of social networks☆ , 1995 .

[13]  John Scott Social Network Analysis , 1988 .

[14]  Noah E. Friedkin,et al.  Theoretical Foundations for Centrality Measures , 1991, American Journal of Sociology.

[15]  J. Bolland,et al.  Sorting out centrality: An analysis of the performance of four centrality models in real and simulated networks , 1988 .

[16]  R. Burt Social Contagion and Innovation: Cohesion versus Structural Equivalence , 1987, American Journal of Sociology.

[17]  J. Galaskiewicz Social Organization in an Urban Grants Economy , 1985 .

[18]  Everett M. Rogers,et al.  Communication Networks: Toward a New Paradigm for Research , 1980 .

[19]  L. Freeman Centrality in social networks conceptual clarification , 1978 .

[20]  P. Bonacich TECHNIQUE FOR ANALYZING OVERLAPPING MEMBERSHIPS , 1972 .

[21]  M. Becker,et al.  Sociometric Location and Innovativeness: Reformulation and Extension of the Diffusion Model , 1970 .

[22]  J. Coleman,et al.  Medical Innovation: A Diffusion Study. , 1967 .