Project and Forget: Solving Large-Scale Metric Constrained Problems

Given a set of dissimilarity measurements amongst data points, determining what metric representation is most "consistent" with the input measurements or the metric that best captures the relevant geometric features of the data is a key step in many machine learning algorithms. Existing methods are restricted to specific kinds of metrics or small problem sizes because of the large number of metric constraints in such problems. In this paper, we provide an active set algorithm, Project and Forget, that uses Bregman projections, to solve metric constrained problems with many (possibly exponentially) inequality constraints. We provide a theoretical analysis of \textsc{Project and Forget} and prove that our algorithm converges to the global optimal solution and that the $L_2$ distance of the current iterate to the optimal solution decays asymptotically at an exponential rate. We demonstrate that using our method we can solve large problem instances of three types of metric constrained problems: general weight correlation clustering, metric nearness, and metric learning; in each case, out-performing the state of the art methods with respect to CPU times and problem sizes.

[1]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[2]  Santosh S. Vempala,et al.  The Cutting Plane Method Is Polynomial for Perfect Matchings , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[3]  David F. Gleich,et al.  Metric-Constrained Optimization for Graph Clustering Algorithms , 2019, SIAM J. Math. Data Sci..

[4]  Alvaro R. De Pierro,et al.  On the convergence properties of Hildreth's quadratic programming algorithm , 1990, Math. Program..

[5]  Yuantao Gu,et al.  Random Multi-Constraint Projection: Stochastic Gradient Methods for Convex Optimization with Many Constraints , 2015, ArXiv.

[6]  Lalit Jain,et al.  If it ain't broke, don't fix it: Sparse metric repair , 2017, 2017 55th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[7]  Laurent Poirrier,et al.  On the depth of cutting planes , 2019, 1903.05304.

[8]  Angelia Nedic,et al.  Random algorithms for convex minimization problems , 2011, Math. Program..

[9]  Inderjit S. Dhillon,et al.  Matrix Nearness Problems with Bregman Divergences , 2007, SIAM J. Matrix Anal. Appl..

[10]  Marco Molinaro,et al.  Theoretical challenges towards cutting-plane selection , 2018, Math. Program..

[11]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[12]  David F. Gleich,et al.  A Parallel Projection Method for Metric Constrained Optimization , 2019, CSC.

[13]  Y. Censor,et al.  Parallel Optimization: Theory, Algorithms, and Applications , 1997 .

[14]  L. Lasdon,et al.  Nonlinear Optimization by Successive Linear Programming , 1982 .

[15]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[16]  R. Gomory AN ALGORITHM FOR THE MIXED INTEGER PROBLEM , 1960 .

[17]  Boris Polyak Random Algorithms for Solving Convex Inequalities , 2001 .

[18]  Paul T. Boggs,et al.  Sequential Quadratic Programming , 1995, Acta Numerica.

[19]  Inderjit S. Dhillon,et al.  The Metric Nearness Problem , 2008, SIAM J. Matrix Anal. Appl..

[20]  Thorsten Joachims,et al.  Cutting-plane training of structural SVMs , 2009, Machine Learning.

[21]  Yichuan Tang,et al.  Deep Learning using Support Vector Machines , 2013, ArXiv.

[22]  Alfredo N. Iusem On Dual Convergence and the Rate of Primal Convergence of Bregman's Convex Programming Method , 1991, SIAM J. Optim..

[23]  Anna C. Gilbert,et al.  Unsupervised Metric Learning in Presence of Missing Data , 2018, 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[24]  Amos Fiat,et al.  Correlation Clustering - Minimizing Disagreements on Arbitrary Weighted Graphs , 2003, ESA.

[25]  Hao Wang,et al.  A Scalable Approach for General Correlation Clustering , 2013, ADMA.

[26]  Yair Censor,et al.  The Dykstra algorithm with Bregman projec-tions , 1998 .

[27]  Nir Ailon,et al.  Aggregating inconsistent information: Ranking and clustering , 2008 .

[28]  D. Bertsekas,et al.  Incremental Constraint Projection-Proximal Methods for Nonsmooth Convex Optimization , 2013 .

[29]  Venkatesan Guruswami,et al.  Clustering with qualitative information , 2005, 44th Annual IEEE Symposium on Foundations of Computer Science, 2003. Proceedings..

[30]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[31]  Marc Sebban,et al.  A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[32]  Francisco Herrera,et al.  A Tutorial on Distance Metric Learning: Mathematical Foundations, Algorithms and Software , 2018, ArXiv.