Strahler based graph clustering using convolution

We propose a method for the visualization of large graphs. Our approach is based on the calculation of a density function resulting from the application of a metric on the vertices of a graph. The density function is then filtered using a convolution, leading to a partition of the graph. The choice of an appropriate kernel for the convolution makes it possible to control the number of clusters, and their size. Our algorithm can be executed automatically, but the parameters can also be interactively fixed by the user. We applied the algorithm to the problem of legacy code extraction from inclusion relation of C++ source files and film sequence analysis. The metric used here is defined from Strahler numbers, which measure the "ramification" level of graph vertices.

[1]  Richard C. Holt,et al.  MoJo: a distance metric for software clusterings , 1999, Sixth Working Conference on Reverse Engineering (Cat. No.PR00303).

[2]  Anton J. Enright,et al.  An efficient algorithm for large-scale detection of protein families. , 2002, Nucleic acids research.

[3]  Ning Chen,et al.  A graph-based clustering algorithm in large transaction databases , 2001, Intell. Data Anal..

[4]  David Auber Outils de visualisation de larges structures de données , 2002 .

[5]  Ivan Herman,et al.  Density functions for visual attributes and effective partitioning in graph visualization , 2000, IEEE Symposium on Information Visualization 2000. INFOVIS 2000. Proceedings.

[6]  Stefano Rizzi,et al.  Genetic operators for hierarchical graph clustering , 1998, Pattern Recognit. Lett..

[7]  Philippe Duchon,et al.  New Strahler Numbers for Rooted Plane Trees , 2004 .

[8]  R. Horton EROSIONAL DEVELOPMENT OF STREAMS AND THEIR DRAINAGE BASINS; HYDROPHYSICAL APPROACH TO QUANTITATIVE MORPHOLOGY , 1945 .

[9]  Jenny Benois-Pineau,et al.  DAG-based visual interfaces for navigation in indexed video content , 2006, Multimedia Tools and Applications.

[10]  Richard C. Holt,et al.  On the stability of software clustering algorithms , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[11]  Andrei P. Ershov On programming of arithmetic operations , 1958, CACM.

[12]  John R. Smith,et al.  MPEG-7 multimedia description schemes , 2001, IEEE Trans. Circuits Syst. Video Technol..

[13]  Andrew B. Kahng,et al.  Recent directions in netlist partitioning: a survey , 1995, Integr..

[14]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[15]  H Kawaji,et al.  A graph-based clustering method for a large set of sequences using a graph partitioning algorithm. , 2001, Genome informatics. International Conference on Genome Informatics.

[16]  Xavier Gérard Viennot,et al.  Combinatorial analysis of ramified patterns and computer imagery of trees , 1989, SIGGRAPH.

[17]  Stefano Rizzi,et al.  Dynamic Clustering of Maps in Autonomous Agents , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  T. A. Wiggerts,et al.  Using clustering algorithms in legacy systems remodularization , 1997, Proceedings of the Fourth Working Conference on Reverse Engineering.

[19]  Emden R. Gansner,et al.  Bunch: a clustering tool for the recovery and maintenance of software system structures , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[20]  Philippe Flajolet,et al.  The Number of Registers Required for Evaluating Arithmetic Expressions , 1979, Theor. Comput. Sci..

[21]  David Auber,et al.  USING STRAHLER NUMBERS FOR REAL TIME VISUAL EXPLORATION OF HUGE GRAPHS , 2002 .

[22]  Rainer Koschke,et al.  A framework for experimental evaluation of clustering techniques , 2000, Proceedings IWPC 2000. 8th International Workshop on Program Comprehension.

[23]  David Auber,et al.  Tulip - A Huge Graph Visualization Framework , 2004, Graph Drawing Software.

[24]  Emden R. Gansner,et al.  Using automatic clustering to produce high-level system organizations of source code , 1998, Proceedings. 6th International Workshop on Program Comprehension. IWPC'98 (Cat. No.98TB100242).

[25]  A. N. Strahler Hypsometric (area-altitude) analysis of erosional topography. , 1952 .