Submodular fractional programming for balanced clustering

We address the balanced clustering problem where cluster sizes are regularized with submodular functions. The objective function for balanced clustering is a submodular fractional function, i.e., the ratio of two submodular functions, and thus includes the well-known ratio cuts as special cases. In this paper, we present a novel algorithm for minimizing this objective function (submodular fractional programming) using recent submodular optimization techniques. The main idea is to utilize an algorithm to minimize the difference of two submodular functions, combined with the discrete Newton method. Thus, it can be applied to the objective function involving any submodular functions in both the numerator and the denominator, which enables us to design flexible clustering setups. We also give theoretical analysis on the algorithm, and evaluate the performance through comparative experiments with conventional algorithms by artificial and real datasets.

[1]  P. Pardalos,et al.  Handbook of Combinatorial Optimization , 1998 .

[2]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[3]  Inderjit S. Dhillon,et al.  Co-clustering documents and words using bipartite spectral graph partitioning , 2001, KDD '01.

[4]  Andreas Krause,et al.  Near-Optimal Sensor Placements in Gaussian Processes: Theory, Efficient Algorithms and Empirical Studies , 2008, J. Mach. Learn. Res..

[5]  Tomasz Radzik Fractional Combinatorial Optimization , 1998 .

[6]  Andreas Krause,et al.  Nonmyopic active learning of Gaussian processes: an exploration-exploitation approach , 2007, ICML '07.

[7]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[8]  Werner Dinkelbach On Nonlinear Fractional Programming , 1967 .

[9]  Fei Wang,et al.  Clustering with Local and Global Regularization , 2007, IEEE Transactions on Knowledge and Data Engineering.

[10]  Robert E. Tarjan,et al.  A Fast Parametric Maximum Flow Algorithm and Applications , 1989, SIAM J. Comput..

[11]  Jeff A. Bilmes,et al.  A Submodular-supermodular Procedure with Applications to Discriminative Structure Learning , 2005, UAI.

[12]  Chris H. Q. Ding,et al.  A Probabilistic Approach for Optimizing Spectral Clustering , 2005, NIPS.

[13]  Jack Edmonds,et al.  Submodular Functions, Matroids, and Certain Polyhedra , 2001, Combinatorial Optimization.

[14]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[15]  Andreas Rudolph,et al.  Techniques of Cluster Algorithms in Data Mining , 2002, Data Mining and Knowledge Discovery.

[16]  Maurice Queyranne,et al.  Minimizing symmetric submodular functions , 1998, Math. Program..

[17]  Jeff A. Bilmes,et al.  Q-Clustering , 2005, NIPS.

[18]  Sachin B. Patkar,et al.  Improving graph partitions using submodular functions , 2003, Discret. Appl. Math..

[19]  Satoru Fujishige,et al.  Submodular functions and optimization , 1991 .

[20]  Jeff A. Bilmes,et al.  Local Search for Balanced Submodular Clusterings , 2007, IJCAI.

[21]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[22]  Chris H. Q. Ding,et al.  A min-max cut algorithm for graph partitioning and data clustering , 2001, Proceedings 2001 IEEE International Conference on Data Mining.