Higher Order Fused Regularization for Supervised Learning with Grouped Parameters

We often encounter situations in supervised learning where there exist possibly groups that consist of more than two parameters. For example, we might work on parameters that correspond to words expressing the same meaning, music pieces in the same genre, and books released in the same year. Based on such auxiliary information, we could suppose that parameters in a group have similar roles in a problem and similar values. In this paper, we propose the Higher Order Fused (HOF) regularization that can incorporate smoothness among parameters with group structures as prior knowledge in supervised learning. We define the HOF penalty as the Lovasz extension of a submodular higher-order potential function, which encourages parameters in a group to take similar estimated values when used as a regularizer. Moreover, we develop an efficient network flow algorithm for calculating the proximity operator for the regularized problem. We investigate the empirical performance of the proposed algorithm by using synthetic and real-world data.

[1]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[2]  Satoru Fujishige,et al.  Realization of set functions as cut functions of graphs and hypergraphs , 2001, Discret. Math..

[3]  J. Moreau Fonctions convexes duales et points proximaux dans un espace hilbertien , 1962 .

[4]  L. Rudin,et al.  Nonlinear total variation based noise removal algorithms , 1992 .

[5]  Robert E. Tarjan,et al.  A Fast Parametric Maximum Flow Algorithm and Applications , 1989, SIAM J. Comput..

[6]  Valérie R. Wajs,et al.  A variational formulation for frame-based inverse problems , 2007 .

[7]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[8]  Francis Bach,et al.  Shaping Level Sets with Submodular Functions , 2010, NIPS.

[9]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[10]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[11]  Yoshinobu Kawahara,et al.  Structured Convex Optimization under Submodular Constraints , 2013, UAI.

[12]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[13]  Kazuyuki Aihara,et al.  Size-constrained Submodular Minimization through Minimum Norm Base , 2011, ICML.

[14]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[15]  Wen Gao,et al.  Efficient Generalized Fused Lasso and its Application to the Diagnosis of Alzheimer's Disease , 2014, AAAI.

[16]  Stanley Osher,et al.  A Unified Primal-Dual Algorithm Framework Based on Bregman Iteration , 2010, J. Sci. Comput..

[17]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[18]  Jack Edmonds,et al.  Submodular Functions, Matroids, and Certain Polyhedra , 2001, Combinatorial Optimization.

[19]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[20]  Patrick L. Combettes,et al.  Proximal Splitting Methods in Signal Processing , 2009, Fixed-Point Algorithms for Inverse Problems in Science and Engineering.

[21]  R. Tibshirani,et al.  A note on the group lasso and a sparse group lasso , 2010, 1001.0736.

[22]  Patrick L. Combettes,et al.  Signal Recovery by Proximal Forward-Backward Splitting , 2005, Multiscale Model. Simul..

[23]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[24]  Takashi Inui,et al.  Extracting Semantic Orientations of Words using Spin Model , 2005, ACL.

[25]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[26]  Satoru Fujishige,et al.  Submodular functions and optimization , 1991 .

[27]  László Lovász,et al.  Submodular functions and convexity , 1982, ISMP.

[28]  Francis R. Bach,et al.  Structured sparsity-inducing norms through submodular functions , 2010, NIPS.

[29]  S. Fujishige,et al.  The Minimum-Norm-Point Algorithm Applied to Submodular Function Minimization and Linear Programming , 2006 .

[30]  Pushmeet Kohli,et al.  Robust Higher Order Potentials for Enforcing Label Consistency , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Julien Mairal,et al.  Structured sparsity through convex optimization , 2011, ArXiv.

[32]  Shuiwang Ji,et al.  SLEP: Sparse Learning with Efficient Projections , 2011 .

[33]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[34]  Francis R. Bach,et al.  Structured Variable Selection with Sparsity-Inducing Norms , 2009, J. Mach. Learn. Res..

[35]  Jieping Ye,et al.  Efficient Methods for Overlapping Group Lasso , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Vladimir Kolmogorov,et al.  What energy functions can be minimized via graph cuts? , 2002, IEEE Transactions on Pattern Analysis and Machine Intelligence.