Transfer learning of regression models from a sequence of datasets by penalized estimation.

Transfer learning refers to the promising idea of initializing model fits based on pre-training on other data. We particularly consider regression modeling settings where parameter estimates from previous data can be used as anchoring points, yet may not be available for all parameters, thus covariance information cannot be reused. A procedure that updates through targeted penalized estimation, which shrinks the estimator towards a nonzero value, is presented. The parameter estimate from the previous data serves as this nonzero value when an update is sought from novel data. This naturally extends to a sequence of data sets with the same response, but potentially only partial overlap in covariates. The iteratively updated regression parameter estimator is shown to be asymptotically unbiased and consistent. The penalty parameter is chosen through constrained cross-validated loglikelihood optimization. The constraint bounds the amount of shrinkage of the updated estimator toward the current one from below. The bound aims to preserve the (updated) estimator's goodness-of-fit on all-but-the-novel data. The proposed approach is compared to other regression modeling procedures. Finally, it is illustrated on an epidemiological study where the data arrive in batches with different covariate-availability and the model is re-fitted with the availability of a novel batch.

[1]  Melvin J. Hinich,et al.  Time Series Analysis by State Space Methods , 2001 .

[2]  J. F. Lawless,et al.  Mean Squared Error Properties of Generalized Ridge Estimators , 1981 .

[3]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[4]  Douglas M. Bates,et al.  Linear mixed models and penalized least squares , 2004 .

[5]  Ana Ivelisse Avilés,et al.  Linear Mixed Models for Longitudinal Data , 2001, Technometrics.

[6]  R. Plackett Some theorems in least squares. , 1950, Biometrika.

[7]  R. Schaefer,et al.  A ridge logistic estimator , 1984 .

[8]  Wessel N. van Wieringen,et al.  Updating of the Gaussian graphical model through targeted penalized estimation , 2020, J. Multivar. Anal..

[9]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[10]  R. Carroll,et al.  On Robustness in the Logistic Regression Model , 1993 .

[11]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[12]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[13]  W. Hemmerle An Explicit Solution for Generalized Ridge Regression , 1975 .

[14]  Kenji Fukumizu,et al.  A General Class of Transfer Learning Regression without Implementation Cost , 2020, ArXiv.

[15]  Eran Segal,et al.  Axes of a revolution: challenges and promises of big data in healthcare , 2020, Nature Medicine.

[16]  Jonathan Ling,et al.  Explainable statistical learning in public health for policy development: the case of real-world suicide data , 2019, BMC Medical Research Methodology.

[17]  Gerhard Tutz,et al.  Boosting ridge regression , 2007, Comput. Stat. Data Anal..

[18]  S. Cessie,et al.  Ridge Estimators in Logistic Regression , 1992 .

[19]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[20]  J. Berger Statistical Decision Theory and Bayesian Analysis , 1988 .