Multi-class Transductive Learning Based on ℓ1 Relaxations of Cheeger Cut and Mumford-Shah-Potts Model

Recent advances in ℓ1 optimization for imaging problems provide promising tools to solve the fundamental high-dimensional data classification in machine learning. In this paper, we extend the main result of Szlam and Bresson (Proceedings of the 27th International Conference on Machine Learning, pp. 1039–1046, 2010), which introduced an exact ℓ1 relaxation of the Cheeger ratio cut problem for unsupervised data classification. The proposed extension deals with the multi-class transductive learning problem, which consists in learning several classes with a set of labels for each class. Learning several classes (i.e. more than two classes) simultaneously is generally a challenging problem, but the proposed method builds on strong results introduced in imaging to overcome the multi-class issue. Besides, the proposed multi-class transductive learning algorithms also benefit from recent fast ℓ1 solvers, specifically designed for the total variation norm, which plays a central role in our approach. Finally, experiments demonstrate that the proposed ℓ1 relaxation algorithms are more accurate and robust than standard ℓ2 relaxation methods s.a. spectral clustering, particularly when considering a very small number of labels for each class to be classified. For instance, the mean error of classification for the benchmark MNIST dataset of 60,000 data in $\mathbb{R}^{784}$ using the proposed ℓ1 relaxation of the multi-class Cheeger cut is 2.4 % when only one label is considered for each class, while the error of classification for the ℓ2 relaxation method of spectral clustering is 24.7 %.

[1]  Daniel Cremers,et al.  Continuous ratio optimization via convex relaxation with applications to multiview 3D reconstruction , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Jan-Michael Frahm,et al.  Fast Global Labeling for Real-Time Stereo Using Multiple Plane Sweeps , 2008, VMV.

[3]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[4]  Jing Yuan,et al.  Convex Multi-class Image Labeling by Simplex-Constrained Total Variation , 2009, SSVM.

[5]  J. Cheeger A lower bound for the smallest eigenvalue of the Laplacian , 1969 .

[6]  Christoph Schnörr,et al.  Continuous Multiclass Labeling Approaches and Algorithms , 2011, SIAM J. Imaging Sci..

[7]  T. Chan,et al.  A Variational Level Set Approach to Multiphase Motion , 1996 .

[8]  A. Fiacco A Finite Algorithm for Finding the Projection of a Point onto the Canonical Simplex of R " , 2009 .

[9]  Vladimir Kolmogorov,et al.  An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision , 2004, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Matthias Hein,et al.  Spectral clustering based on the graph p-Laplacian , 2009, ICML '09.

[11]  Matthias Hein,et al.  Beyond Spectral Clustering - Tight Relaxations of Balanced Graph Cuts , 2011, NIPS.

[12]  Daniel Cremers,et al.  A convex approach for computing minimal partitions , 2008 .

[13]  Werner Dinkelbach On Nonlinear Fractional Programming , 1967 .

[14]  José M. Bioucas-Dias,et al.  A New TwIST: Two-Step Iterative Shrinkage/Thresholding Algorithms for Image Restoration , 2007, IEEE Transactions on Image Processing.

[15]  Xue-Cheng Tai,et al.  Global Minimization for Continuous Multiphase Partitioning Problems Using a Dual Approach , 2011, International Journal of Computer Vision.

[16]  R. B. Potts Some generalized order-disorder transformations , 1952, Mathematical Proceedings of the Cambridge Philosophical Society.

[17]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[18]  Vladimir Kolmogorov,et al.  Applications of parametric maxflow in computer vision , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[20]  Xavier Bresson,et al.  Total Variation, Cheeger Cuts , 2010, ICML.

[21]  Mikhail Belkin,et al.  Problems of learning on manifolds , 2003 .

[22]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  Christopher K. I. Williams,et al.  Advances in Neural Information Processing Systems 15 (NIPS 2002) , 2002 .

[24]  Tom Goldstein,et al.  The Split Bregman Method for L1-Regularized Problems , 2009, SIAM J. Imaging Sci..

[25]  Arthur D. Szlam,et al.  Total variation and cheeger cuts , 2010, ICML 2010.

[26]  Tony F. Chan,et al.  A Multiphase Level Set Framework for Image Segmentation Using the Mumford and Shah Model , 2002, International Journal of Computer Vision.

[27]  Matthias Hein,et al.  An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA , 2010, NIPS.

[28]  Gilbert Strang,et al.  Maximal flow through a domain , 1983, Math. Program..

[29]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[30]  R. Glowinski,et al.  Augmented Lagrangian and Operator-Splitting Methods in Nonlinear Mechanics , 1987 .

[31]  D. Mumford,et al.  Optimal approximations by piecewise smooth functions and associated variational problems , 1989 .