Multi-Task Learning with Incomplete Data for Healthcare

Multi-task learning is a type of transfer learning that trains multiple tasks simultaneously and leverages the shared information between related tasks to improve the generalization performance. However, missing features in the input matrix is a much more difficult problem which needs to be carefully addressed. Removing records with missing values can significantly reduce the sample size, which is impractical for datasets with large percentage of missing values. Popular imputation methods often distort the covariance structure of the data, which causes inaccurate inference. In this paper we propose using plug-in covariance matrix estimators to tackle the challenge of missing features. Specifically, we analyze the plug-in estimators under the framework of robust multi-task learning with LASSO and graph regularization, which captures the relatedness between tasks via graph regularization. We use the Alzheimer's disease progression dataset as an example to show how the proposed framework is effective for prediction and model estimation when missing data is present.

[1]  Alexander J. Smola,et al.  Multitask Learning without Label Correspondences , 2010, NIPS.

[2]  Wei Xiao,et al.  Prognosis and Diagnosis of Parkinson's Disease Using Multi-Task Learning , 2017, KDD.

[3]  Karim Lounici High-dimensional covariance matrix estimation with missing observations , 2012, 1201.2577.

[4]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[5]  M. Gribaudo,et al.  2002 , 2001, Cell and Tissue Research.

[6]  Paul H. Calamai,et al.  Projected gradient methods for linearly constrained problems , 1987, Math. Program..

[7]  Enrico Tronci 1997 , 1997, Les 25 ans de l’OMC: Une rétrospective en photos.

[8]  Kilian Q. Weinberger,et al.  Large Margin Multi-Task Metric Learning , 2010, NIPS.

[9]  Jiayu Zhou,et al.  A multi-task learning formulation for predicting disease progression , 2011, KDD.

[10]  Gunnar Rätsch,et al.  Multitask Learning in Computational Biology , 2011, ICML Unsupervised and Transfer Learning.

[11]  J. Ibrahim Incomplete Data in Generalized Linear Models , 1990 .

[12]  Sebastian Thrun,et al.  Learning to Learn , 1998, Springer US.

[13]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[14]  J. Schafer,et al.  Missing data: our view of the state of the art. , 2002, Psychological methods.

[15]  A. Azzouz 2011 , 2020, City.

[16]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, ISIT.

[17]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[18]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[19]  Geert Molenberghs,et al.  Analyzing Incomplete Discrete Longitudinal Clinical Trial Data , 2006, math/0606497.

[20]  L. Chu,et al.  The reliability and validity of the Alzheimer's Disease Assessment Scale Cognitive Subscale (ADAS-Cog) among the elderly Chinese in Hong Kong. , 2000, Annals of the Academy of Medicine, Singapore.

[21]  Jiayu Zhou,et al.  FORMULA: FactORized MUlti-task LeArning for task discovery in personalized medical models , 2015, SDM.

[22]  L. Miles,et al.  2000 , 2000, RDH.

[23]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[24]  Tom Heskes,et al.  Empirical Bayes for Learning to Learn , 2000, ICML.