Semi-Supervised Multitask Learning

A semi-supervised multitask learning (MTL) framework is presented, in which M parameterized semi-supervised classifiers, each associated with one of M partially labeled data manifolds, are learned jointly under the constraint of a soft-sharing prior imposed over the parameters of the classifiers. The unlabeled data are utilized by basing classifier learning on neighborhoods, induced by a Markov random walk over a graph representation of each manifold. Experimental results on real data sets demonstrate that semi-supervised MTL yields significant improvements in generalization performance over either semi-supervised single-task learning (STL) or supervised MTL.

[1]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[2]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[3]  Wei-Ying Ma,et al.  Collaborative Ensemble Learning: Combining Collaborative and Content-Based Information Filtering via Hierarchical Bayes , 2002, UAI.

[4]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[5]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[6]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[7]  D. Burr,et al.  A Bayesian Semiparametric Model for Random-Effects Meta-Analysis , 2005 .

[8]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[9]  P. Müller,et al.  A method for combining inference across related nonparametric Bayesian models , 2004 .

[10]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[11]  Jonathan Baxter,et al.  Learning internal representations , 1995, COLT '95.

[12]  Nicholas I. M. Gould,et al.  An Introduction to Algorithms for Nonlinear Optimization , 2003 .

[13]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[14]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[15]  Lawrence Carin,et al.  Detection of buried targets via active selection of labeled data: application to sensing subsurface UXO , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[16]  B. Mallick,et al.  Combining information from several experiments with nonparametric priors , 1997 .

[17]  R. Wolpert,et al.  Combining Information From Related Regressions , 1997 .

[18]  Volker Tresp,et al.  A nonparametric hierarchical bayesian framework for information filtering , 2004, SIGIR '04.

[19]  Yiming Yang,et al.  Learning Multiple Related Tasks using Latent Independent Component Analysis , 2005, NIPS.

[20]  Tommi S. Jaakkola,et al.  Partially labeled classification with Markov random walks , 2001, NIPS.

[21]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[22]  Michael E. Tipping The Relevance Vector Machine , 1999, NIPS.

[23]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[24]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[25]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[26]  Lawrence Carin,et al.  Semi-Supervised Classification , 2004, Encyclopedia of Database Systems.

[27]  Lawrence Carin,et al.  Learning Classifiers on a Partially Labeled Data Manifold , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[28]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[29]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[30]  G. Glass Primary, Secondary, and Meta-Analysis of Research1 , 1976 .

[31]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[32]  Geoffrey E. Hinton,et al.  Learning and relearning in Boltzmann machines , 1986 .

[33]  Lawrence Carin,et al.  Multi-task learning for underwater object classification , 2007, SPIE Defense + Commercial Sensing.

[34]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[35]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[36]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[37]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[38]  S. Ganesalingam Classification and Mixture Approaches to Clustering Via Maximum Likelihood , 1989 .

[39]  A. Gelfand,et al.  Dirichlet Process Mixed Generalized Linear Models , 1997 .

[40]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[41]  Peter D. Hoff,et al.  Nonparametric Modeling of Hierarchically Exchangeable Data , 2003 .

[42]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[43]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[44]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.