Smooth approximation method for non-smooth empirical risk minimization based distance metric learning

Distance metric learning (DML) has become a very active research field in recent years. Bian and Tao (IEEE Trans. Neural Netw. Learn. Syst. 23(8) (2012) 1194-1205) presented a constrained empirical risk minimization (ERM) framework for DML. In this paper, we utilize smooth approximation method to make their algorithm applicable to the non-differentiable hinge loss function. We show that the objective function with hinge loss is equivalent to a non-smooth min-max representation, from which an approximate objective function is derived. Compared to the original objective function, the approximate one becomes differentiable with Lipschitz-continuous gradient. Consequently, Nesterov's optimal first-order method can be directly used. Finally, the effectiveness of our method is evaluated on various UCI datasets.

[1]  Aharon Ben-Tal,et al.  Lectures on modern convex optimization , 1987 .

[2]  Weifeng Liu,et al.  Multiview Hessian Regularization for Image Annotation , 2013, IEEE Transactions on Image Processing.

[3]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[4]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[5]  Arkadi Nemirovski,et al.  Lectures on modern convex optimization - analysis, algorithms, and engineering applications , 2001, MPS-SIAM series on optimization.

[6]  Zenglin Xu,et al.  Robust Metric Learning by Smooth Optimization , 2010, UAI.

[7]  Lei Wang,et al.  Positive Semidefinite Metric Learning with Boosting , 2009, NIPS.

[8]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Zenglin Xu,et al.  Smooth Optimization for Effective Multiple Kernel Learning , 2010, AAAI.

[10]  Kaizhu Huang,et al.  Sparse Metric Learning via Smooth Optimization , 2009, NIPS.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[13]  Rong Jin,et al.  Regularized Distance Metric Learning: Theory and Algorithm , 2009, NIPS.

[14]  Gene H. Golub,et al.  Matrix computations , 1983 .

[15]  Dacheng Tao,et al.  Learning a Distance Metric by Empirical Loss Minimization , 2011, IJCAI.

[16]  Yoram Singer,et al.  Online and batch learning of pseudo-metrics , 2004, ICML.

[17]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[18]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[19]  Dacheng Tao,et al.  Constrained Empirical Risk Minimization Framework for Distance Metric Learning , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[20]  Knud D. Andersen,et al.  The Mosek Interior Point Optimizer for Linear Programming: An Implementation of the Homogeneous Algorithm , 2000 .

[21]  Peng Li,et al.  Distance Metric Learning with Eigenvalue Optimization , 2012, J. Mach. Learn. Res..