Network Clustering

Clustering can be loosely defined as the process of grouping objects into sets called clusters, so that each cluster consists of elements that are similar in some way. The similarity criterion can be defined in several different ways, depending on applications of interest and the objectives that the clustering aims to achieve. For example, in distance-based clustering (see Figure 1) two or more elements belong to the same cluster if they are close with respect to a given distance metric. On the other hand, in conceptual clustering, which can be traced back to Aristotle and his work on classifying plants and animals, the similarity of elements is based on descriptive concepts. Clustering is used for multiple purposes, including finding “natural” clusters (modules) and describing their properties, classifying the data, and detecting unusual data objects (outliers). In addition, treating a cluster or one of its elements as a single representative unit allows us to achieve data reduction. Network clustering, which is the subject of this chapter, deals with clustering the data represented as a network, or a graph. Indeed, many data types can be conveniently modeled using graphs. This process is sometimes called link analysis. Data points are represented by vertices and an edge exists if two data points are similar or related in a certain way. It is important

[1]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[2]  Teofilo F. GONZALEZ,et al.  Clustering to Minimize the Maximum Intercluster Distance , 1985, Theor. Comput. Sci..

[3]  Tao Jiang,et al.  Algorithmic Approaches to Clustering Gene Expression Data , 2002 .

[4]  Valmir Carneiro Barbosa,et al.  A distributed algorithm to find k-dominating sets , 2004, Discret. Appl. Math..

[5]  G. Nemhauser,et al.  The k-Domination and k-Stability Problems on Sun-Free Chordal Graphs , 1984 .

[6]  Sergiy Butenko,et al.  Novel Approaches for Analyzing Biological Networks , 2005, J. Comb. Optim..

[7]  Mauricio G. C. Resende,et al.  Greedy Randomized Adaptive Search Procedures , 1995, J. Glob. Optim..

[8]  Valerie Guralnik,et al.  A scalable algorithm for clustering protein sequences , 2001, BIOKDD.

[9]  Laurence A. Wolsey,et al.  Formulations and valid inequalities for the node capacitated graph partitioning problem , 1996, Math. Program..

[10]  Panos M. Pardalos,et al.  On maximum clique problems in very large graphs , 1999, External Memory Algorithms.

[11]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Hsueh-I Lu,et al.  The Power of Local Optimization: Approximation Algorithms for Maximum-Leaf Spanning Tree , 2007 .

[13]  Heping Zhang,et al.  Correcting the loss of cell-cycle synchrony in clustering analysis of microarray data using weights , 2004, Bioinform..

[14]  John Scott Social Network Analysis , 1988 .

[15]  David K. Smith Network Flows: Theory, Algorithms, and Applications , 1994 .

[16]  Ron Shamir,et al.  A clustering algorithm based on graph connectivity , 2000, Inf. Process. Lett..

[17]  Reinhard Diestel,et al.  Graph Theory , 1997 .

[18]  Kyoungrim Lee,et al.  Study of protein–protein interaction using conformational space annealing , 2005, Proteins.

[19]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[20]  J. Rothberg,et al.  Gaining confidence in high-throughput protein interaction networks , 2004, Nature Biotechnology.

[21]  George L. Nemhauser,et al.  Min-cut clustering , 1993, Math. Program..

[22]  M. Jambu,et al.  Cluster analysis and data analysis , 1985 .

[23]  Michael A. Langston,et al.  Combinatorial Genetic Regulatory Network Analysis Tools for High Throughput Transcriptomic Data , 2005, Systems Biology and Regulatory Genomics.

[24]  Samir Khuller,et al.  Approximation Algorithms for Connected Dominating Sets , 1996, Algorithmica.

[25]  David B. Shmoys,et al.  A Best Possible Heuristic for the k-Center Problem , 1985, Math. Oper. Res..

[26]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[27]  Ron Shamir,et al.  An algorithm for clustering cDNAs for gene expression analysis , 1999, RECOMB.

[28]  Ehl Emile Aarts,et al.  Simulated annealing and Boltzmann machines , 2003 .

[29]  D. Shasha,et al.  A Gene Expression Map of the Arabidopsis Root , 2003, Science.

[30]  Gilbert Laporte,et al.  An exact algorithm for the maximum k-club problem in an undirected graph , 1999, Eur. J. Oper. Res..

[31]  Michael A. Langston,et al.  Detecting Network Motifs in Gene Co-expression Networks Through Integration of Protein Domain Information , 2004 .

[32]  Julien Gagneur,et al.  Modular decomposition of protein-protein interaction networks , 2004, Genome Biology.

[33]  Laurence A. Wolsey,et al.  The node capacitated graph partitioning problem: A computational study , 1998, Math. Program..

[34]  Gary D. Bader,et al.  An automated method for finding molecular complexes in large protein interaction networks , 2003, BMC Bioinformatics.

[35]  P. Holme Core-periphery organization of complex networks. , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[36]  Fred W. Glover,et al.  Clustering of Microarray data via Clique Partitioning , 2005, J. Comb. Optim..

[37]  D. West Introduction to Graph Theory , 1995 .

[38]  Jill Duncan,et al.  Analyzing microarray data using cluster analysis. , 2003, Pharmacogenomics.

[39]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[40]  Fred W. Glover,et al.  Tabu Search , 1997, Handbook of Heuristics.

[41]  Shay Kutten,et al.  Fast Distributed Construction of Small k-Dominating Sets and Applications , 1998, J. Algorithms.

[42]  M F Janowitz Cluster Analysis Algorithms for Image Segmentation. , 1981 .

[43]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[44]  Stephen B. Seidman,et al.  Network structure and minimum degree , 1983 .

[45]  Giorgio Gambosi,et al.  Complexity and approximation: combinatorial optimization problems and their approximability properties , 1999 .

[46]  Gilbert Laporte,et al.  Heuristics for finding k-clubs in an undirected graph , 2000, Comput. Oper. Res..

[47]  Yves Crama,et al.  Local Search in Combinatorial Optimization , 2018, Artificial Neural Networks.

[48]  Dorit S. Hochbaum,et al.  Approximation Algorithms for NP-Hard Problems , 1996 .

[49]  David B. Shmoys,et al.  A unified approach to approximation algorithms for bottleneck problems , 1986, JACM.

[50]  Sergiy Butenko,et al.  Graph Domination, Coloring and Cliques in Telecommunications , 2006, Handbook of Optimization in Telecommunications.

[51]  Sandra Sudarsky,et al.  Massive Quasi-Clique Detection , 2002, LATIN.

[52]  M. Palumbo,et al.  Patterns, structures, and amino acid frequencies in structural building blocks, a protein secondary structure classification scheme , 1997, Proteins.

[53]  Michael A. Trick,et al.  Cliques and clustering: A combinatorial approach , 1998, Oper. Res. Lett..

[54]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[55]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[56]  John D. Storey,et al.  A network-based analysis of systemic inflammation in humans , 2005, Nature.

[57]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[58]  Patricia De la Vega,et al.  Discovery of Gene Function by Expression Profiling of the Malaria Parasite Life Cycle , 2003, Science.

[59]  R. Ravi,et al.  Approximating Maximum Leaf Spanning Trees in Almost Linear Time , 1998, J. Algorithms.

[60]  Christos H. Papadimitriou,et al.  Computational complexity , 1993 .

[61]  Roberto Solis-Oba,et al.  A 2-Approximation Algorithm for Finding a Spanning Tree with Maximum Number of Leaves , 1998, Algorithmica.

[62]  Yi Pan,et al.  Improved K-means clustering algorithm for exploring local protein sequence motifs representing common structural property , 2005, IEEE Transactions on NanoBioscience.

[63]  L. Mirny,et al.  Protein complexes and functional modules in molecular networks , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[64]  Panos M. Pardalos,et al.  A New Heuristic for the Minimum Connected Dominating Set Problem on Ad Hoc Wireless Networks , 2004 .

[65]  R. Sharan,et al.  Cluster analysis and its applications to gene expression data. , 2002, Ernst Schering Research Foundation workshop.

[66]  Arthur L. Liestman,et al.  CLUSTERING ALGORITHMS FOR AD HOC WIRELESS NETWORKS , 2004 .