A clustering coefficient for weighted networks, with application to gene expression data

The clustering coefficient has been used successfully to summarise important features of unweighted, undirected networks across a wide range of applications in complexity science. Recently, a number of authors have extended this concept to the case of networks with non-negatively weighted edges. After reviewing various alternatives, we focus on a definition due to Zhang and Horvath that can be traced back to earlier work of Grindrod. We give a natural and transparent derivation of this clustering coefficient and then analyse its properties. One attraction of this version is that it deals directly with weighted edges and avoids the need to discretise, that is, to round weights up to 1 or down to 0. This has the advantages of (a) retaining all edge weight information, and (b) eliminating the requirement for an arbitrary cutoff level. Further, the extended definition is much less likely to break down due to a ‘divide-by-zero’. Using our new derivation and focusing on some special cases allows us to gain insights into the typical behaviour of this measure. We then illustrate the idea by computing the generalised clustering coefficients, along with the corresponding weighted degrees, for pairwise correlation gene expression data arising from microarray experiments. We find that the weighted clustering and degree distributions reveal global topological differences between normal and tumour networks.

[1]  Raya Khanin,et al.  How Scale-Free Are Biological Networks , 2006, J. Comput. Biol..

[2]  Gary Hardiman,et al.  Microarray platforms--comparisons and contrasts. , 2004, Pharmacogenomics.

[3]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Sangsoo Kim,et al.  Gene expression Differential coexpression analysis using microarray data and its application to human cancer , 2005 .

[5]  Uc San Francisco,et al.  Microarray Gene Expression Data with Linked Survival Phenotypes: Diffuse Large-B-Cell Lymphoma Revisited , 2005 .

[6]  D. Higham Spectral Reordering of a Range-Dependent Weighted Random Graph , 2005 .

[7]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[8]  Joseph T. Chang,et al.  Spectral biclustering of microarray data: coclustering genes and conditions. , 2003, Genome research.

[9]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[10]  Dorothea Wagner,et al.  Approximating Clustering Coefficient and Transitivity , 2005, J. Graph Algorithms Appl..

[11]  K. Kaski,et al.  Intensity and coherence of motifs in weighted complex networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Adil M. Bagirov,et al.  New algorithms for multi-class cancer diagnosis using tumor gene expression signatures , 2003, Bioinform..

[13]  Joshua M. Stuart,et al.  Conserved Genetic Modules 5 / 29 / 2003 1 A gene co-expression network for global discovery of conserved genetic modules in H . sapiens , D . melanogaster , C . elegans , and S . cerevisiae , 2003 .

[14]  Jacques Rougemont,et al.  DNA microarray data and contextual analysis of correlation graphs , 2003, BMC Bioinformatics.

[15]  Gabriela Kalna,et al.  Spectral analysis of two-signed microarray expression data. , 2007, Mathematical medicine and biology : a journal of the IMA.

[16]  D. Botstein,et al.  Gene expression patterns in human liver cancers. , 2002, Molecular biology of the cell.

[17]  R. Tibshirani,et al.  Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Jesús M. González-Barahona,et al.  Applying Social Network Analysis to the Information in CVS Repositories , 2004, MSR.

[19]  Jennifer A. Scott,et al.  RAL-TR-2003-036 HSL MC 73 : A fast multilevel Fiedler and profile reduction code , 2022 .

[20]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[21]  P. Grindrod Range-dependent random graphs and their application to modeling large small-world Proteome datasets. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[23]  A. Vespignani,et al.  The architecture of complex weighted networks. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Johan T den Dunnen,et al.  A common reference for cDNA microarray hybridizations. , 2002, Nucleic acids research.

[25]  Guillermo Ricardo Simari,et al.  Non-commercial Research and Educational Use including without Limitation Use in Instruction at Your Institution, Sending It to Specific Colleagues That You Know, and Providing a Copy to Your Institution's Administrator. All Other Uses, Reproduction and Distribution, including without Limitation Comm , 2022 .

[26]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[27]  Ash A. Alizadeh,et al.  Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling , 2000, Nature.

[28]  Igor Jurisica,et al.  Modeling interactome: scale-free or geometric? , 2004, Bioinform..

[29]  S. Horvath,et al.  A General Framework for Weighted Gene Co-Expression Network Analysis , 2005, Statistical applications in genetics and molecular biology.