Robust Regression with Data-Dependent Regularization Parameters and Autoregressive Temporal Correlations

We introduce robust procedures for analyzing water quality data collected over time. One challenging task in analyzing such data is how to achieve robustness in presence of outliers while maintaining high estimation efficiency so that we can draw valid conclusions and provide useful advices in water management. The robust approach requires specification of a loss function such as the Huber, Tukey’s bisquare and the exponential loss function, and an associated tuning parameter determining the extent of robustness needed. High robustness is at the cost of efficiency loss in parameter loss. To this end, we propose a data-driven method which leads to more efficient parameter estimation. This data-dependent approach allows us to choose a regularization (tuning) parameter that depends on the proportion of “outliers” in the data so that estimation efficiency is maximized. We illustrate the proposed methods using a study on ammonium nitrogen concentrations from two sites in the Huaihe River in China, where the interest is in quantifying the trend in the most recent years while accounting for possible temporal correlations and “irregular” observations in earlier years.

[1]  Joyce Snell,et al.  6. Alternative Methods of Regression , 1996 .

[2]  You-Gan Wang,et al.  Sediment concentration prediction and statistical evaluation for annual load estimation , 2013 .

[3]  Dissolved microcystins in surface and ground waters in regions with high cancer incidence in the Huai River Basin of China. , 2013, Chemosphere.

[4]  T. Hettmansperger,et al.  Robust analysis of variance based upon a likelihood ratio criterion , 1980 .

[5]  Yan Lu,et al.  Detecting gradual and abrupt changes in water quality time series in response to regional payment programs for watershed services in an agricultural area , 2015 .

[6]  Z. Bai,et al.  Robust Estimation Using the Huber Function With a Data-Dependent Tuning Constant , 2007 .

[7]  Heping Zhang,et al.  Robust Variable Selection With Exponential Squared Loss , 2013, Journal of the American Statistical Association.

[8]  Cheng-zhu Zhu,et al.  Trace organic pollutants in sediments from Huaihe River, China: Evaluation of sources and ecological risk , 2014 .

[9]  J. R. Koehler,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[10]  You‐Gan Wang,et al.  A Modified Pseudolikelihood Approach for Analysis of Longitudinal Data , 2007, Biometrics.

[11]  Walter Krämer,et al.  Review of Modern applied statistics with S, 4th ed. by W.N. Venables and B.D. Ripley. Springer-Verlag 2002 , 2003 .

[12]  David Birkes,et al.  Alternative Methods of Regression: Birkes/Alternative , 1993 .

[13]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[14]  Thomas C. Pagano,et al.  Using Hydrologic Simulation to Explore the Impacts of Climate Change on Runoff in the Huaihe River Basin of China , 2013 .

[15]  You‐Gan Wang,et al.  Effects of Variance‐Function Misspecification in Analysis of Longitudinal Data , 2005, Biometrics.

[16]  You-Gan Wang,et al.  Load estimation with uncertainties from opportunistic sampling data: A semiparametric approach , 2011 .