Multi-Task Averaging

We present a multi-task learning approach to jointly estimate the means of multiple independent data sets. The proposed multi-task averaging (MTA) algorithm results in a convex combination of the single-task averages. We derive the optimal amount of regularization, and show that it can be effectively estimated. Simulations and real data experiments demonstrate that MTA outperforms both maximum likelihood and James-Stein estimators, and that our approach to estimating the amount of regularization rivals cross-validation in performance but is more computationally efficient.

[1]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[2]  Gunther Wyszecki,et al.  Color Science: Concepts and Methods, Quantitative Data and Formulae, 2nd Edition , 2000 .

[3]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[4]  Leonhard Held,et al.  Gaussian Markov Random Fields: Theory and Applications , 2005 .

[5]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[6]  Maya R. Gupta,et al.  Similarity-based Classification: Concepts and Algorithms , 2009, J. Mach. Learn. Res..

[7]  Maya R. Gupta,et al.  Multiresolutional regularization of local linear regression over adaptive neighborhoods for color management , 2008, 2008 15th IEEE International Conference on Image Processing.

[8]  G. Casella An Introduction to Empirical Bayes Data Analysis , 1985 .

[9]  François Fouss,et al.  An Experimental Investigation of Graph Kernels on a Collaborative Recommendation Task , 2006, Sixth International Conference on Data Mining (ICDM'06).

[10]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[11]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[12]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[13]  B. Efron,et al.  Stein's Paradox in Statistics , 1977 .

[14]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[15]  Charles A. Micchelli,et al.  Kernels for Multi--task Learning , 2004, NIPS.

[16]  P. J. Green,et al.  Density Estimation for Statistics and Data Analysis , 1987 .

[17]  E. L. Lehmann,et al.  Theory of point estimation , 1950 .

[18]  Charles A. Micchelli,et al.  A Spectral Regularization Framework for Multi-Task Structure Learning , 2007, NIPS.

[19]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[20]  Maya R. Gupta,et al.  Filtering tandem mass spectra for quality , 2012 .

[21]  Maya R. Gupta,et al.  Local similarity discriminant analysis , 2007, ICML '07.

[22]  Maya R. Gupta,et al.  Generative models for similarity-based classification , 2008, Pattern Recognit..

[23]  Daniel Sheldon,et al.  Graphical Multi-Task Learning , 2008 .

[24]  M. Bock Minimax Estimators of the Mean of a Multivariate Normal Distribution , 1975 .

[25]  L. Held,et al.  Gaussian Markov Random Fields: Theory And Applications (Monographs on Statistics and Applied Probability) , 2005 .

[26]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[27]  J. Neyman,et al.  INADMISSIBILITY OF THE USUAL ESTIMATOR FOR THE MEAN OF A MULTIVARIATE NORMAL DISTRIBUTION , 2005 .

[28]  Hal Daumé,et al.  Learning Multiple Tasks using Manifold Regularization , 2010, NIPS.

[29]  Daniel Marcu,et al.  Domain Adaptation for Statistical Classifiers , 2006, J. Artif. Intell. Res..

[30]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[31]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[32]  Ján Morovic,et al.  Accuracy-Preserving Smoothing Of Color Transformation LUTs , 2008, Color Imaging Conference.

[33]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[34]  Inderjit S. Dhillon,et al.  Clustering with Bregman Divergences , 2005, J. Mach. Learn. Res..

[35]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[36]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[37]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[38]  Pavel Yu. Chebotarev,et al.  The Matrix-Forest Theorem and Measuring Relations in Small Social Groups , 2006, ArXiv.

[39]  Hui Li,et al.  Semisupervised Multitask Learning , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Heidi Hoyle,et al.  Spatial Forecast Methods for Terrorist Events in Urban Environments , 2004, ISI.

[41]  J. Sherman,et al.  Adjustment of an Inverse Matrix Corresponding to a Change in One Element of a Given Matrix , 1950 .

[42]  Robert J. Plemmons,et al.  Nonnegative Matrices in the Mathematical Sciences , 1979, Classics in Applied Mathematics.

[43]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[44]  Maya R. Gupta,et al.  Adaptive Local Linear Regression With Application to Printer Color Management , 2008, IEEE Transactions on Image Processing.

[45]  Maya R. Gupta,et al.  Bayesian and pairwise local similarity discriminant analysis , 2010, 2010 2nd International Workshop on Cognitive Information Processing.

[46]  François Fouss,et al.  The Principal Components Analysis of a Graph, and Its Relationships to Spectral Clustering , 2004, ECML.

[47]  Francis R. Bach,et al.  A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization , 2008, J. Mach. Learn. Res..

[48]  Masashi Sugiyama,et al.  Multi-Task Learning via Conic Programming , 2007, NIPS.