Multivariate calibration on heterogeneous samples

Abstract Data heterogeneity has become a challenging problem in modern data analysis. Classic statistical modeling methods, which assume the data are independent and identically distributed, often show unsatisfactory performance on heterogeneous data. This work is motivated by a multivariate calibration problem from a soil characterization study, where the samples were collected from five different locations. Newly proposed and existing signal regression models are applied to the multivariate calibration problem, where the models are adapted to handle such spatially clustered structure. When compared to a variety of other methods, e.g. kernel ridge regression, random forests, and partial least squares, we find that our newly proposed varying-coefficient signal regression model is highly competitive, often out-performing the other methods, in terms of external prediction error.

[1]  M. Durbán,et al.  Generalized linear array models with applications to multidimensional smoothing , 2006 .

[2]  Dandan Wang,et al.  Synthesized use of VisNIR DRS and PXRF for soil characterization: Total carbon and total nitrogen☆ , 2015 .

[3]  T. Hastie,et al.  [A Statistical View of Some Chemometrics Regression Tools]: Discussion , 1993 .

[4]  Paul H. C. Eilers,et al.  Flexible smoothing with B-splines and penalties , 1996 .

[5]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[6]  Paul H. C. Eilers,et al.  Generalized linear regression on sampled signals and curves: a P -spline approach , 1999 .

[7]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[8]  B. Muthén Latent variable modeling in heterogeneous populations , 1989 .

[9]  J. Friedman,et al.  A Statistical View of Some Chemometrics Regression Tools , 1993 .

[10]  Brian D. Marx,et al.  Practical Smoothing , 2021 .

[11]  Varying-coefficient single-index signal regression , 2015 .

[12]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[13]  Morten Arendt Rasmussen,et al.  Generalized L1 penalized matrix factorization , 2017 .

[14]  Gilbert Saporta,et al.  Clusterwise PLS regression on a stochastic process , 2002, Comput. Stat. Data Anal..

[15]  Bin Li,et al.  Multivariate calibration with single-index signal regression , 2009 .

[16]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[17]  C. Preda,et al.  PCR and PLS for Clusterwise Regression on Functional Data , 2007 .

[18]  I. Jolliffe A Note on the Use of Principal Components in Regression , 1982 .

[19]  W. DeSarbo,et al.  A maximum likelihood methodology for clusterwise linear regression , 1988 .

[20]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[21]  B. Marx,et al.  Multivariate calibration with temperature interaction using two-dimensional penalized signal regression , 2003 .

[22]  A. E. Hoerl,et al.  Ridge regression: biased estimation for nonorthogonal problems , 2000 .

[23]  Dinggang Shen,et al.  Flexible Locally Weighted Penalized Regression With Applications on Prediction of Alzheimer’s Disease Neuroimaging Initiative’s Clinical Scores , 2019, IEEE Transactions on Medical Imaging.

[24]  S. Wold,et al.  The Collinearity Problem in Linear Regression. The Partial Least Squares (PLS) Approach to Generalized Inverses , 1984 .