Mining the Largest Dense Vertexlet in a Weighted Scale-free Graph

An important problem of knowledge discovery that has recently evolved in various reallife networks is identifying the largest set of vertices that are functionally associated. The topology of many real-life networks shows scale-freeness, where the vertices of the underlying graph follow a power-law degree distribution. Moreover, the graphs corresponding to most of the real-life networks are weighted in nature. In this article, the problem of finding the largest group or association of vertices that are dense (denoted as dense vertexlet) in a weighted scale-free graph is addressed. Density quantifies the degree of similarity within a group of vertices in a graph. The density of a vertexlet is defined in a novel way that ensures significant participation of all the vertices within the vertexlet. It is established that the problem is NP-complete in nature. An upper bound on the order of the largest dense vertexlet of a weighted graph, with respect to certain density threshold value, is also derived. Finally, an O(n$^2$ log n) (n denotes the number of vertices in the graph) heuristic graph mining algorithm that produces an approximate solution for the problem is presented.

[1]  Philip S. Yu,et al.  A graph-based approach to systematically reconstruct human transcriptional regulatory modules , 2007, ISMB/ECCB.

[2]  Elena Marchiori,et al.  Genetic, Iterated and Multistart Local Search for the Maximum Clique Problem , 2002, EvoWorkshops.

[3]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Pedro Mendes,et al.  Artificial gene networks for objective comparison of analysis algorithms , 2003, ECCB.

[5]  D. Botstein,et al.  The transcriptional program in the response of human fibroblasts to serum. , 1999, Science.

[6]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[7]  Joaquín Dopazo,et al.  FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes , 2004, Bioinform..

[8]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[9]  Guy N. Brock,et al.  Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes , 2008, BMC Bioinformatics.

[10]  Torsten Fahle,et al.  Simple and Fast: Improving a Branch-And-Bound Algorithm for Maximum Clique , 2002, ESA.

[11]  Jingsheng Lei,et al.  An Improved Ant Colony Optimization for the Maximum Clique Problem , 2007, Third International Conference on Natural Computation (ICNC 2007).

[12]  Patric R. J. Östergård,et al.  A fast algorithm for the maximum clique problem , 2002, Discret. Appl. Math..

[13]  Panos M. Pardalos,et al.  The maximum clique problem , 1994, J. Glob. Optim..

[14]  J. Håstad Clique is hard to approximate withinn1−ε , 1999 .

[15]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[16]  David R. Wood,et al.  An algorithm for finding a maximum clique in a graph , 1997, Oper. Res. Lett..

[17]  Shailesh V. Date,et al.  A Probabilistic Functional Network of Yeast Genes , 2004, Science.

[18]  Jiahai Wang,et al.  Maximum neural network with nonlinear self-feedback for maximum clique problem , 2004, Neurocomputing.

[19]  Jiawei Han,et al.  Mining coherent dense subgraphs across massive biological networks for functional discovery , 2005, ISMB.

[20]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[21]  Jin Xu,et al.  A simple simulated annealing algorithm for the maximum clique problem , 2007, Inf. Sci..

[22]  D. Du,et al.  Theory of Computational Complexity , 2000 .

[23]  Kengo Katayama,et al.  An effective local search for the maximum clique problem , 2005, Inf. Process. Lett..

[24]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[25]  Zhaohui S. Qin,et al.  Clustering microarray gene expression data using weighted Chinese restaurant process , 2006, Bioinform..

[26]  Reka Albert,et al.  Mean-field theory for scale-free random networks , 1999 .

[27]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..