Performance criteria for graph clustering and Markov cluster experiments

textabstractIn~[1] a cluster algorithm for graphs was introduced called the Markov cluster algorithm or MCL~algorithm. The algorithm is based on simulation of (stochastic) flow in graphs by means of alternation of two operators, expansion and inflation. The results in~[2] establish an intrinsic relationship between the corresponding algebraic process (MCL~process) and cluster structure in the iterands and the limits of the process. Several kinds of experiments conducted with the MCL~algorithm are described here. Test cases with varying homogeneity characteristics are used to establish some of the particular strengths and weaknesses of the algorithm. In general the algorithm performs well, except for graphs which are very homogeneous (such as weakly connected grids) and for which the natural cluster diameter (i.e. the diameter of a subgraph induced by a natural cluster) is large. This can be understood in terms of the flow characteristics of the MCL~algorithm and the heuristic on which the algorithm is grounded. A generic performance criterion for clusterings of weighted graphs is derived, by a stepwise refinement of a simple and appealing criterion for simple graphs. The most refined criterion uses a particular Schur convex function, several properties of which are established. A metric is defined on the space of partitions, which is useful for comparing different clusterings of the same graph. The metric is compared with the metric known as the equivalence mismatch coefficient. The performance criterion and the metric are used for the quantitative measurement of experiments conducted with the MCL~algorithm on randomly generated test graphs with 10000 nodes. Scaling the MCL~algorithm requires a regime of pruning the stochastic matrices which need to be computed. The effect of pruning on the quality of the retrieved clusterings is also investigated. [1] A cluster algorithm for graphs. Technical report INS-R0010, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, 2000. [2] A stochastic uncoupling process for graphs. Technical report INS-R0011, National Research Institute for Mathematics and Computer Science in the Netherlands, Amsterdam, 2000.