Multi-Target Regression via Robust Low-Rank Learning

Multi-target regression has recently regained great popularity due to its capability of simultaneously learning multiple relevant regression tasks and its wide applications in data mining, computer vision and medical image analysis, while great challenges arise from jointly handling inter-target correlations and input-output relationships. In this paper, we propose Multi-layer Multi-target Regression (MMR) which enables simultaneously modeling intrinsic inter-target correlations and nonlinear input-output relationships in a general framework via robust low-rank learning. Specifically, the MMR can explicitly encode inter-target correlations in a structure matrix by matrix elastic nets (MEN); the MMR can work in conjunction with the kernel trick to effectively disentangle highly complex nonlinear input-output relationships; the MMR can be efficiently solved by a new alternating optimization algorithm with guaranteed convergence. The MMR leverages the strength of kernel methods for nonlinear feature learning and the structural advantage of multi-layer learning architectures for inter-target correlation modeling. More importantly, it offers a new multi-layer learning paradigm for multi-target regression which is endowed with high generality, flexibility and expressive ability. Extensive experimental evaluation on 18 diverse real-world datasets demonstrates that our MMR can achieve consistently high performance and outperforms representative state-of-the-art algorithms, which shows its great effectiveness and generality for multivariate prediction.

[1]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[2]  Grigorios Tsoumakas,et al.  Multi-target regression via input space expansion: treating targets as inputs , 2012, Machine Learning.

[3]  Feiping Nie,et al.  A Closed Form Solution to Multi-View Low-Rank Regression , 2015, AAAI.

[4]  Francesco Dinuzzo,et al.  Learning output kernels for multi-task problems , 2013, Neurocomputing.

[5]  Adam J Rothman,et al.  Sparse Multivariate Regression With Covariance Estimation , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[6]  Na Chen,et al.  Error Analysis for Matrix Elastic-Net Regularization Algorithms , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Zoubin Ghahramani,et al.  A Non-parametric Conditional Factor Regression Model for Multi-Dimensional Input and Response , 2014, AISTATS.

[8]  Hao Helen Zhang,et al.  ON THE ADAPTIVE ELASTIC-NET WITH A DIVERGING NUMBER OF PARAMETERS. , 2009, Annals of statistics.

[9]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[10]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[11]  Benjamin Recht,et al.  Random Features for Large-Scale Kernel Machines , 2007, NIPS.

[12]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[13]  Jinbo Bi,et al.  On Multiplicative Multitask Feature Learning , 2014, NIPS.

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Cheng Soon Ong,et al.  Multivariate spearman's ρ for aggregating ranks using copulas , 2016 .

[16]  Lie Wang,et al.  Multivariate Regression with Calibration , 2013, NIPS.

[17]  Subramanian Ramanathan,et al.  A Multi-Task Learning Framework for Head Pose Estimation under Target Motion , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Lorenzo Rosasco,et al.  Convex Learning of Multiple Tasks and their Structure , 2015, ICML.

[19]  Qing Ling,et al.  Multi-Task Learning for Subspace Segmentation , 2015, ICML.

[20]  Feiping Nie,et al.  Low-Rank Matrix Recovery via Efficient Schatten p-Norm Minimization , 2012, AAAI.

[21]  Luis Alonso,et al.  Multioutput Support Vector Regression for Remote Sensing Biophysical Parameter Estimation , 2011, IEEE Geoscience and Remote Sensing Letters.

[22]  Oliver Stegle,et al.  It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals , 2013, NIPS.

[23]  Andrew Gordon Wilson,et al.  Deep Kernel Learning , 2015, AISTATS.

[24]  Zhen Wang,et al.  Learning Low-Rank Label Correlations for Multi-label Classification with Missing Labels , 2014, 2014 IEEE International Conference on Data Mining.

[25]  K. Yano On Harmonic and Killing Vector Fields , 1952 .

[26]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[27]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[28]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[29]  Tapio Elomaa,et al.  Multi-target regression with rule ensembles , 2012, J. Mach. Learn. Res..

[30]  Bernhard Schölkopf,et al.  The representer theorem for Hilbert spaces: a necessary and sufficient condition , 2012, NIPS.

[31]  Hal Daumé,et al.  Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression , 2012, NIPS.

[32]  Neil D. Lawrence,et al.  Kernels for Vector-Valued Functions: a Review , 2011, Found. Trends Mach. Learn..

[33]  Jieping Ye,et al.  Canonical Correlation Analysis for Multilabel Classification: A Least-Squares Formulation, Extensions, and Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[35]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[36]  Jieping Ye,et al.  Robust multi-task feature learning , 2012, KDD.

[37]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[38]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[39]  Gene H. Golub,et al.  Matrix computations , 1983 .

[40]  Jieping Ye,et al.  Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks , 2010, TKDD.

[41]  Concha Bielza,et al.  A survey on multi‐output regression , 2015, WIREs Data Mining Knowl. Discov..

[42]  Jiayu Zhou,et al.  Efficient multi-task feature learning with calibration , 2014, KDD.

[43]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[44]  Qiang Zhou,et al.  Flexible Clustered Multi-Task Learning by Learning Representative Tasks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[45]  Xiantong Zhen,et al.  Multi-scale deep networks and regression forests for direct bi-ventricular volume estimation , 2016, Medical Image Anal..

[46]  Matti Pirinen,et al.  Multiple Output Regression with Latent Noise , 2014, J. Mach. Learn. Res..

[47]  Michael I. Jordan,et al.  Learning Transferable Features with Deep Adaptation Networks , 2015, ICML.

[48]  Hal Daumé,et al.  Bayesian Multitask Learning with Latent Hierarchies , 2009, UAI.

[49]  L. Armijo Minimization of functions having Lipschitz continuous first partial derivatives. , 1966 .

[50]  Rama Chellappa,et al.  Growing Regression Forests by Classification: Applications to Object Pose Estimation , 2013, ECCV.

[51]  Andrew Gordon Wilson,et al.  Gaussian Process Regression Networks , 2011, ICML.

[52]  Feiping Nie,et al.  Efficient and Robust Feature Selection via Joint ℓ2, 1-Norms Minimization , 2010, NIPS.

[53]  Lei Han,et al.  Learning Tree Structure in Multi-Task Learning , 2015, KDD.

[54]  Enhong Chen,et al.  A Nonconvex Relaxation Approach for Rank Minimization Problems , 2015, AAAI.

[55]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[56]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[57]  Peter V. Gehler,et al.  Learning Output Kernels with Block Coordinate Descent , 2011, ICML.

[58]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2004 .

[59]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[60]  Lie Wang,et al.  Calibrated multivariate regression with application to neural semantic basis discovery , 2013, J. Mach. Learn. Res..

[61]  Fernando Pérez-Cruz,et al.  SVM multiregression for nonlinear channel estimation in multiple-input multiple-output systems , 2004, IEEE Transactions on Signal Processing.

[62]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[63]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.