Multi-Task Learning and Its Applications to Biomedical Informatics

In many fields one needs to build predictive models for a set of related machine learning tasks. Traditionally these tasks are treated independently and the inference is done separately for each task, which ignores inherent connections among the tasks. Multi-task learning aims to improve the generalization performance by building models for all tasks simultaneously, leveraging inherent relatedness of these tasks. In this talk, we show how multi-task learning can be applied to improve the predictive modeling from electronic medical records (EMR). We consider a novel data-driven framework for densifying EMR to address the challenges from the data sparsity when EMR are used for predictive modeling. By treating the densification of each patient as a learning task, the proposed multi-task learning algorithm simultaneously densifies all patients. As such, the densification of one patient leverages useful information from other patients. Experiments on real clinical data show that the densification can significantly improve the predictive performance.

[1]  Tony Jebara,et al.  Multitask Sparsity via Maximum Entropy Discrimination , 2011, J. Mach. Learn. Res..

[2]  Chris H. Q. Ding,et al.  Spectral Relaxation for K-means Clustering , 2001, NIPS.

[3]  Sethu Vijayakumar,et al.  Multi-task Gaussian Process Learning of Robot Inverse Dynamics , 2008, NIPS.

[4]  Paul M. Thompson,et al.  Mapping hippocampal and ventricular change in Alzheimer disease , 2004, NeuroImage.

[5]  R. Coleman,et al.  Neuroimaging and early diagnosis of Alzheimer disease: a look to the future. , 2003, Radiology.

[6]  Nick C Fox,et al.  The clinical use of structural MRI in Alzheimer disease , 2010, Nature Reviews Neurology.

[7]  Kaori Ito,et al.  Disease progression model for cognitive deterioration from Alzheimer's Disease Neuroimaging Initiative database , 2011, Alzheimer's & Dementia.

[8]  Dit-Yan Yeung,et al.  A Convex Formulation for Learning Task Relationships in Multi-Task Learning , 2010, UAI.

[9]  Volker Tresp,et al.  A nonparametric hierarchical bayesian framework for information filtering , 2004, SIGIR '04.

[10]  Chris H. Q. Ding,et al.  K-means clustering via principal component analysis , 2004, ICML.

[11]  C. Jack,et al.  Hypothetical model of dynamic biomarkers of the Alzheimer's pathological cascade , 2010, The Lancet Neurology.

[12]  Jieping Ye,et al.  Learning incoherent sparse and low-rank patterns from multiple tasks , 2010 .

[13]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[14]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[15]  Jasper Snoek,et al.  Multi-Task Bayesian Optimization , 2013, NIPS.

[16]  Kaustubh Supekar,et al.  Estimation of functional connectivity in fMRI data using stability selection-based sparse partial correlation with elastic net penalty , 2012, NeuroImage.

[17]  Larry A. Wasserman,et al.  Union Support Recovery in Multi-task Learning , 2010, J. Mach. Learn. Res..

[18]  Tong Zhang,et al.  A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[19]  Massimiliano Pontil,et al.  An Algorithm for Transfer Learning in a Heterogeneous Environment , 2008, ECML/PKDD.

[20]  K. Davis,et al.  A new rating scale for Alzheimer's disease. , 1984, The American journal of psychiatry.

[21]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[22]  Maya R. Gupta,et al.  Multi-Task Averaging , 2012, NIPS.

[23]  Anders M. Dale,et al.  Six-month atrophy in MTL structures is associated with subsequent memory decline in elderly controls , 2010, NeuroImage.

[24]  Dit-Yan Yeung,et al.  Transfer metric learning by learning task relationships , 2010, KDD.

[25]  J. Ashford,et al.  Modeling the time-course of Alzheimer dementia , 2001, Current psychiatry reports.

[26]  Kiralee M. Hayashi,et al.  3D Mapping of Mini-mental State Examination Performance in Clinical and Preclinical Alzheimer Disease , 2006, Alzheimer disease and associated disorders.

[27]  Jiayu Zhou,et al.  Clustered Multi-Task Learning Via Alternating Structure Optimization , 2011, NIPS.

[28]  Z. Khachaturian Diagnosis of Alzheimer's disease. , 1985, Archives of neurology.

[29]  Jean-Claude Baron,et al.  Early diagnosis of alzheimer’s disease: contribution of structural neuroimaging , 2003, NeuroImage.

[30]  G. Frisoni,et al.  Detection of grey matter loss in mild Alzheimer's disease with voxel based morphometry , 2002, Journal of neurology, neurosurgery, and psychiatry.

[31]  Jason Roy,et al.  Prediction Modeling Using EHR Data: Challenges, Strategies, and a Comparison of Machine Learning Approaches , 2010, Medical care.

[32]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[33]  Ruslan Salakhutdinov,et al.  Probabilistic Matrix Factorization , 2007, NIPS.

[34]  Daniel Hernández-Lobato,et al.  Learning Feature Selection Dependencies in Multi-task Learning , 2013, NIPS.

[35]  David B. Dunson,et al.  Multi-task compressive sensing with Dirichlet process priors , 2008, ICML '08.

[36]  Yee Whye Teh,et al.  Semiparametric latent factor models , 2005, AISTATS.

[37]  Peter Sollich,et al.  Learning curves for multi-task Gaussian process regression , 2012, NIPS.

[38]  Alessandro Lazaric,et al.  Bayesian Multi-Task Reinforcement Learning , 2010, ICML.

[39]  P. Scheltens,et al.  Research criteria for the diagnosis of Alzheimer's disease: revising the NINCDS–ADRDA criteria , 2007, The Lancet Neurology.

[40]  Edwin V. Bonilla,et al.  Multi-task Gaussian Process Prediction , 2007, NIPS.

[41]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[42]  T. Lasko,et al.  Computational Phenotype Discovery Using Unsupervised Feature Learning over Noisy, Sparse, and Irregular Clinical Data , 2013, PloS one.

[43]  Paul M. Thompson,et al.  Sparse reduced-rank regression detects genetic associations with voxel-wise longitudinal phenotypes in Alzheimer's disease , 2012, NeuroImage.

[44]  Thibault Helleputte,et al.  Expectation Propagation for Bayesian Multi-task Feature Selection , 2010, ECML/PKDD.

[45]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[46]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations: II. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[47]  Hui Li,et al.  Multi-task Reinforcement Learning in Partially Observable Stochastic Environments , 2009, J. Mach. Learn. Res..

[48]  Tom Heskes,et al.  Solving a Huge Number of Similar Tasks: A Combination of Multi-Task Learning and a Hierarchical Bayesian Approach , 1998, ICML.

[49]  G. B. Frisoni,et al.  The dynamics of Alzheimer's disease biomarkers in the Alzheimer's Disease Neuroimaging Initiative cohort , 2010, Neurobiology of Aging.

[50]  C. Cooper,et al.  Utility of medical and drug history in fracture risk prediction among men and women. , 2002, Bone.

[51]  D. Louis Collins,et al.  Relating one-year cognitive change in mild cognitive impairment to baseline MRI features , 2009, NeuroImage.

[52]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[53]  Clifford R. Jack,et al.  Predicting Clinical Scores from Magnetic Resonance Scans in Alzheimer's Disease , 2010, NeuroImage.

[54]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[55]  Masashi Sugiyama,et al.  Multi-Task Learning via Conic Programming , 2007, NIPS.

[56]  Charles A. Micchelli,et al.  A Spectral Regularization Framework for Multi-Task Structure Learning , 2007, NIPS.

[57]  Michael A. Saunders,et al.  Proximal Newton-type methods for convex optimization , 2012, NIPS.

[58]  A. Dale,et al.  Combining MR Imaging, Positron-Emission Tomography, and CSF Biomarkers in the Diagnosis and Prognosis of Alzheimer Disease , 2010, American Journal of Neuroradiology.

[59]  Jeff G. Schneider,et al.  Learning Multiple Tasks with a Sparse Matrix-Normal Penalty , 2010, NIPS.

[60]  Trevor Hastie,et al.  Imputing Missing Data for Gene Expression Arrays , 2001 .

[61]  Xi Chen,et al.  Graph-Structured Multi-task Regression and an Efficient Optimization Method for General Fused Lasso , 2010, ArXiv.

[62]  Michael A. Saunders,et al.  LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares , 1982, TOMS.

[63]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[64]  Bin Cao,et al.  Encoding Low-Rank and Sparse Structures Simultaneously in Multi-task Learning , 2012 .

[65]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[66]  M. Folstein,et al.  Clinical diagnosis of Alzheimer's disease , 1984, Neurology.

[67]  Gunnar Rätsch,et al.  Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation , 2011, NIPS.

[68]  V. K. Srivastava,et al.  Estimation of seemingly unrelated regression equations: A brief survey , 1979 .

[69]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[70]  Anton Schwaighofer,et al.  Learning Gaussian Process Kernels via Hierarchical Bayes , 2004, NIPS.

[71]  Volker Tresp,et al.  Robust multi-task learning with t-processes , 2007, ICML '07.

[72]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[73]  Peter Bühlmann,et al.  Causal stability ranking , 2011, Bioinform..

[74]  Rong Jin,et al.  Exclusive Lasso for Multi-task Feature Selection , 2010, AISTATS.

[75]  Min Xu,et al.  Conditional Sparse Coding and Grouped Multivariate Regression , 2012, ICML.

[76]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[77]  Hal Daumé,et al.  Bayesian Multitask Learning with Latent Hierarchies , 2009, UAI.

[78]  B. Mallick,et al.  Combining information from several experiments with nonparametric priors , 1997 .

[79]  Jacques Wainer,et al.  Flexible Modeling of Latent Task Structures in Multitask Learning , 2012, ICML.

[80]  Quanquan Gu,et al.  Learning the Shared Subspace for Multi-task Clustering and Transductive Transfer Classification , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[81]  A. Zellner An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias , 1962 .

[82]  Jiayu Zhou,et al.  A multi-task learning formulation for predicting disease progression , 2011, KDD.

[83]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[84]  Paula Diehr,et al.  Imputation of missing longitudinal data: a comparison of methods. , 2003, Journal of clinical epidemiology.

[85]  Aurelie C. Lozano,et al.  Multi-level Lasso for Sparse Multi-task Regression , 2012, ICML.

[86]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[87]  S. Resnick,et al.  Longitudinal progression of Alzheimer's-like patterns of atrophy in normal older adults: the SPARE-AD index. , 2009, Brain : a journal of neurology.

[88]  Greg M. Allenby,et al.  A Hierarchical Bayes Model of Primary and Secondary Demand , 1998 .

[89]  Stephen J. Wright,et al.  Simultaneous Variable Selection , 2005, Technometrics.

[90]  Fei Wang,et al.  Supervised patient similarity measure of heterogeneous patient records , 2012, SKDD.

[91]  Ming Yang,et al.  Multi-Task Learning with Gaussian Matrix Generalized Inverse Gaussian Model , 2013, ICML.

[92]  David B. Dunson,et al.  Hierarchical kernel stick-breaking process for multi-task image analysis , 2008, ICML '08.

[93]  Zenglin Xu,et al.  Online Learning for Group Lasso , 2010, ICML.

[94]  Shimon Ullman,et al.  Uncovering shared structures in multiclass classification , 2007, ICML '07.

[95]  Eric P. Xing,et al.  Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity , 2009, ICML.

[96]  Stephen J. Wright,et al.  Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..

[97]  Huan Xu,et al.  Robust Multi-task Regression with Grossly Corrupted Observations , 2012, AISTATS.

[98]  Hal Daumé,et al.  Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression , 2012, NIPS.

[99]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[100]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[101]  Xiaohui Chen,et al.  A Two-Graph Guided Multi-task Lasso Approach for eQTL Mapping , 2012, AISTATS.

[102]  Jiayu Zhou,et al.  Integrating low-rank and group-sparse structures for robust multi-task learning , 2011, KDD.

[103]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[104]  C. Jack,et al.  MRI and CSF biomarkers in normal, MCI, and AD subjects , 2009, Neurology.

[105]  Jieping Ye,et al.  Sparse trace norm regularization , 2012, Comput. Stat..

[106]  Hongliang Fei,et al.  Structured feature selection and task relationship inference for multi-task learning , 2011, 2011 IEEE 11th International Conference on Data Mining.

[107]  David B. Dunson,et al.  Multi-task learning for sequential data via iHMMs and the nested Dirichlet process , 2007, ICML '07.

[108]  Johan H. C. Reiber,et al.  MMSE scores correlate with local ventricular enlargement in the spectrum from cognitively normal to Alzheimer disease , 2008, NeuroImage.

[109]  Y. Zhang,et al.  Augmented Lagrangian alternating direction method for matrix separation based on low-rank factorization , 2014, Optim. Methods Softw..

[110]  Jieping Ye,et al.  A convex formulation for learning shared structures from multiple tasks , 2009, ICML '09.

[111]  Jieping Ye,et al.  Robust multi-task feature learning , 2012, KDD.

[112]  Paul Tseng,et al.  Trace Norm Regularization: Reformulations, Algorithms, and Multi-Task Learning , 2010, SIAM J. Optim..

[113]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[114]  Samuel Kaski,et al.  Focused multi-task learning in a Gaussian process framework , 2012, Machine Learning.

[115]  Jonathan Baxter,et al.  A Bayesian/Information Theoretic Model of Learning to Learn via Multiple Task Sampling , 1997, Machine Learning.

[116]  Svetha Venkatesh,et al.  Factorial Multi-Task Learning : A Bayesian Nonparametric Approach , 2013, ICML.

[117]  Massimiliano Pontil,et al.  Exploiting Unrelated Tasks in Multi-Task Learning , 2012, AISTATS.

[118]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[119]  Pratik Jawanpuria,et al.  A Convex Feature Learning Formulation for Latent Task Structure Discovery , 2012, ICML.

[120]  P. Müller,et al.  A method for combining inference across related nonparametric Bayesian models , 2004 .

[121]  T. Schneider Analysis of Incomplete Climate Data: Estimation of Mean Values and Covariance Matrices and Imputation of Missing Values. , 2001 .

[122]  Andrew Gordon Wilson,et al.  Gaussian Process Regression Networks , 2011, ICML.

[123]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[124]  Zhen Li,et al.  Beyond Mahalanobis distance: Learning second-order discriminant function for people verification , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[125]  Qi Tian,et al.  Multimedia LEGO: Learning Structured Model by Probabilistic Logic Ontology Tree , 2013, 2013 IEEE 13th International Conference on Data Mining.

[126]  Dit-Yan Yeung,et al.  Learning High-Order Task Relationships in Multi-Task Learning , 2013, IJCAI.

[127]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[128]  Yueting Zhuang,et al.  Multi-Task Sparse Discriminant Analysis (MtSDA) with Overlapping Categories , 2010, AAAI.

[129]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.