Multi-Target Regression via Robust Low-Rank Learning

Multi-target regression has recently regained great popularity due to its capability of simultaneously learning multiple relevant regression tasks and its wide applications in data mining, computer vision and medical image analysis, while great challenges arise from jointly handling inter-target correlations and input-output relationships. In this paper, we propose Multi-layer Multi-target Regression (MMR) which enables simultaneously modeling intrinsic inter-target correlations and nonlinear input-output relationships in a general framework via robust low-rank learning. Specifically, the MMR can explicitly encode inter-target correlations in a structure matrix by matrix elastic nets (MEN); the MMR can work in conjunction with the kernel trick to effectively disentangle highly complex nonlinear input-output relationships; the MMR can be efficiently solved by a new alternating optimization algorithm with guaranteed convergence. The MMR leverages the strength of kernel methods for nonlinear feature learning and the structural advantage of multi-layer learning architectures for inter-target correlation modeling. More importantly, it offers a new multi-layer learning paradigm for multi-target regression which is endowed with high generality, flexibility and expressive ability. Extensive experimental evaluation on 18 diverse real-world datasets demonstrates that our MMR can achieve consistently high performance and outperforms representative state-of-the-art algorithms, which shows its great effectiveness and generality for multivariate prediction.

[1]  Subramanian Ramanathan,et al.  A Multi-Task Learning Framework for Head Pose Estimation under Target Motion , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Xiantong Zhen,et al.  Multi-scale deep networks and regression forests for direct bi-ventricular volume estimation , 2016, Medical Image Anal..

[3]  Qiang Zhou,et al.  Flexible Clustered Multi-Task Learning by Learning Representative Tasks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[5]  Concha Bielza,et al.  A survey on multi‐output regression , 2015, WIREs Data Mining Knowl. Discov..

[6]  Lei Han,et al.  Learning Tree Structure in Multi-Task Learning , 2015, KDD.

[7]  Qing Ling,et al.  Multi-Task Learning for Subspace Segmentation , 2015, ICML.

[8]  Lorenzo Rosasco,et al.  Convex Learning of Multiple Tasks and their Structure , 2015, ICML.

[9]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[10]  Enhong Chen,et al.  A Nonconvex Relaxation Approach for Rank Minimization Problems , 2015, AAAI.

[11]  Feiping Nie,et al.  A Closed Form Solution to Multi-View Low-Rank Regression , 2015, AAAI.

[12]  Zhen Wang,et al.  Learning Low-Rank Label Correlations for Multi-label Classification with Missing Labels , 2014, 2014 IEEE International Conference on Data Mining.

[13]  J. Bi,et al.  On Multiplicative Multitask Feature Learning , 2014, NIPS.

[14]  Matti Pirinen,et al.  Multiple Output Regression with Latent Noise , 2014, J. Mach. Learn. Res..

[15]  Jiayu Zhou,et al.  Efficient multi-task feature learning with calibration , 2014, KDD.

[16]  Rama Chellappa,et al.  Growing Regression Forests by Classification: Applications to Object Pose Estimation , 2013, ECCV.

[17]  Oliver Stegle,et al.  It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals , 2013, NIPS.

[18]  Lie Wang,et al.  Calibrated multivariate regression with application to neural semantic basis discovery , 2013, J. Mach. Learn. Res..

[19]  Lie Wang,et al.  Multivariate Regression with Calibration , 2013, NIPS.

[20]  Francesco Dinuzzo,et al.  Learning output kernels for multi-task problems , 2013, Neurocomputing.

[21]  Hal Daumé,et al.  Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression , 2012, NIPS.

[22]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[23]  Grigorios Tsoumakas,et al.  Multi-target regression via input space expansion: treating targets as inputs , 2012, Machine Learning.

[24]  Jieping Ye,et al.  Robust multi-task feature learning , 2012, KDD.

[25]  Feiping Nie,et al.  Low-Rank Matrix Recovery via Efficient Schatten p-Norm Minimization , 2012, AAAI.

[26]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[27]  Bernhard Schölkopf,et al.  The representer theorem for Hilbert spaces: a necessary and sufficient condition , 2012, NIPS.

[28]  Na Chen,et al.  Error Analysis for Matrix Elastic-Net Regularization Algorithms , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[29]  Andrew Gordon Wilson,et al.  Gaussian Process Regression Networks , 2011, ICML.

[30]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[31]  Peter V. Gehler,et al.  Learning Output Kernels with Block Coordinate Descent , 2011, ICML.

[32]  Luis Alonso,et al.  Multioutput Support Vector Regression for Remote Sensing Biophysical Parameter Estimation , 2011, IEEE Geoscience and Remote Sensing Letters.

[33]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[34]  Jieping Ye,et al.  Learning incoherent sparse and low-rank patterns from multiple tasks , 2010 .

[35]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[36]  Adam J Rothman,et al.  Sparse Multivariate Regression With Covariance Estimation , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[37]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[38]  Hal Daumé,et al.  Bayesian Multitask Learning with Latent Hierarchies , 2009, UAI.

[39]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[40]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[41]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[42]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[43]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[44]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[45]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[46]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[47]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[48]  Cees G. M. Snoek,et al.  Regularization and variable selection via the elastic net , 2005 .

[49]  Fernando Pérez-Cruz,et al.  SVM multiregression for nonlinear channel estimation in multiple-input multiple-output systems , 2004, IEEE Transactions on Signal Processing.

[50]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[51]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[52]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[53]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[54]  Cheng Soon Ong,et al.  Multivariate spearman's ρ for aggregating ranks using copulas , 2016 .

[55]  Zoubin Ghahramani,et al.  A Non-parametric Conditional Factor Regression Model for Multi-Dimensional Input and Response , 2014, AISTATS.

[56]  Tapio Elomaa,et al.  Multi-target regression with rule ensembles , 2012, J. Mach. Learn. Res..

[57]  Jieping Ye,et al.  Canonical Correlation Analysis for Multilabel Classification: A Least-Squares Formulation, Extensions, and Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  N. Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[59]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[60]  G. Golub Matrix computations , 1983 .

[61]  L. Armijo Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[62]  K. Yano On Harmonic and Killing Vector Fields , 1952 .