Bridging Bayesian and Minimax Mean Square Error Estimation via Wasserstein Distributionally Robust Optimization

We introduce a distributionally robust minimium mean square error estimation model with a Wasserstein ambiguity set to recover an unknown signal from a noisy observation. The proposed model can be viewed as a zero-sum game between a statistician choosing an estimator---that is, a measurable function of the observation---and a fictitious adversary choosing a prior---that is, a pair of signal and noise distributions ranging over independent Wasserstein balls---with the goal to minimize and maximize the expected squared estimation error, respectively. We show that if the Wasserstein balls are centered at normal distributions, then the zero-sum game admits a Nash equilibrium, where the players' optimal strategies are given by an {\em affine} estimator and a {\em normal} prior, respectively. We further prove that this Nash equilibrium can be computed by solving a tractable convex program. Finally, we develop a Frank-Wolfe algorithm that can solve this convex program orders of magnitude faster than state-of-the-art general purpose solvers. We show that this algorithm enjoys a linear convergence rate and that its direction-finding subproblems can be solved in quasi-closed form.

[1]  Yonina C. Eldar,et al.  Regularization in Regression with Bounded Noise: A Chebyshev Center Approach , 2007, SIAM J. Matrix Anal. Appl..

[2]  Boris Polyak,et al.  Constrained minimization methods , 1966 .

[3]  V. F. Demʹi︠a︡nov,et al.  Approximate methods in optimization problems , 1970 .

[4]  Viet Anh Nguyen,et al.  Wasserstein Distributionally Robust Kalman Filtering , 2018, NeurIPS.

[5]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[6]  Jeffrey M. Woodbridge Econometric Analysis of Cross Section and Panel Data , 2002 .

[7]  Daniel Kuhn,et al.  Regularization via Mass Transportation , 2017, J. Mach. Learn. Res..

[8]  M. Sion On general minimax theorems , 1958 .

[9]  Benjamin C. Kuo,et al.  AUTOMATIC CONTROL SYSTEMS , 1962, Universum:Technical sciences.

[10]  João Pedro Hespanha,et al.  Linear Systems Theory , 2009 .

[11]  J. Dunn,et al.  Conditional gradient algorithms with open loop step size rules , 1978 .

[12]  B. A. Schmitt Perturbation bounds for matrix square roots and pythagorean sums , 1992 .

[13]  C. Berge Topological Spaces: including a treatment of multi-valued functions , 2010 .

[14]  M. Gelbrich On a Formula for the L2 Wasserstein Metric between Measures on Euclidean and Hilbert Spaces , 1990 .

[15]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[16]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[17]  H. Theil Introduction to econometrics , 1978 .

[18]  P. Sabatier Multivariate Extremes , 2006 .

[19]  Rajendra Bhatia,et al.  Strong convexity of sandwiched entropies and related optimization problems , 2018, Reviews in Mathematical Physics.

[20]  Yonina C. Eldar,et al.  A Minimax Chebyshev Estimator for Bounded Error Estimation , 2008, IEEE Transactions on Signal Processing.

[21]  A. TUSTIN,et al.  Automatic Control Systems , 1950, Nature.

[22]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[23]  Douglas Kelker,et al.  DISTRIBUTION THEORY OF SPHERICAL DISTRIBUTIONS AND A LOCATION-SCALE PARAMETER GENERALIZATION , 2016 .

[24]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[25]  Yonina C. Eldar,et al.  Mean-Squared Error Estimation for Linear Systems with Block Circulant Uncertainty , 2007, SIAM J. Matrix Anal. Appl..

[26]  C. Givens,et al.  A class of Wasserstein metrics for probability distributions. , 1984 .

[27]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[28]  Yonina C. Eldar,et al.  A competitive minimax approach to robust estimation of random parameters , 2004, IEEE Transactions on Signal Processing.

[29]  J. Lofberg,et al.  YALMIP : a toolbox for modeling and optimization in MATLAB , 2004, 2004 IEEE International Conference on Robotics and Automation (IEEE Cat. No.04CH37508).

[30]  Zdzisław Denkowski,et al.  Set-Valued Analysis , 2021 .

[31]  Marco Cuturi,et al.  Sinkhorn Distances: Lightspeed Computation of Optimal Transport , 2013, NIPS.

[32]  Gabriel Peyré,et al.  Computational Optimal Transport , 2018, Found. Trends Mach. Learn..

[33]  D. Dowson,et al.  The Fréchet distance between multivariate normal distributions , 1982 .

[34]  Paul Grigas,et al.  New analysis and results for the Frank–Wolfe method , 2013, Mathematical Programming.

[35]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[36]  Daniel Kuhn,et al.  Distributionally Robust Inverse Covariance Estimation: The Wasserstein Shrinkage Estimator , 2018, Oper. Res..

[37]  I. Olkin,et al.  The distance between two random vectors with given dispersion matrices , 1982 .

[38]  Yonina C. Eldar,et al.  Linear minimax regret estimation of deterministic parameters with bounded data uncertainties , 2004, IEEE Transactions on Signal Processing.

[39]  Panos M. Pardalos,et al.  Convex optimization theory , 2010, Optim. Methods Softw..

[40]  P. N. Paraskevopoulos,et al.  Modern Control Engineering , 2001 .

[41]  Shlomo Shamai,et al.  Estimation in Gaussian Noise: Properties of the Minimum Mean-Square Error , 2010, IEEE Transactions on Information Theory.

[42]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[43]  Mattia Zorzi,et al.  Robust Kalman Filtering Under Model Perturbations , 2015, IEEE Transactions on Automatic Control.

[44]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[45]  P. A. Blight The Analysis of Time Series: An Introduction , 1991 .

[46]  Suhas N. Diggavi,et al.  The worst additive noise under a covariance constraint , 2001, IEEE Trans. Inf. Theory.

[47]  Yonina C. Eldar Robust Competitive Estimation With Signal and Noise Covariance Uncertainties , 2006, IEEE Transactions on Information Theory.

[48]  J. Dunn Rates of convergence for conditional gradient algorithms near singular and nonsingular extremals , 1978 .

[49]  Michael Unser,et al.  Statistical analysis of functional MRI data in the wavelet domain , 1998, IEEE Transactions on Medical Imaging.

[50]  Peter Sykacek,et al.  Biological assessment of robust noise models in microarray data analysis , 2011, Bioinform..

[51]  Martin Vetterli,et al.  Adaptive wavelet thresholding for image denoising and compression , 2000, IEEE Trans. Image Process..

[52]  Rudi van Drunen,et al.  Localization of Random Pulse Point Sources Using Physically Implementable Search Algorithms , 2020, Optoelectronics, Instrumentation and Data Processing.

[53]  L. Goddard Information Theory , 1962, Nature.

[54]  Yonina C. Eldar,et al.  Robust Mean-Squared Error Estimation of Multiple Signals in Linear Systems Affected by Model and Noise Uncertainties , 2005, Math. Program..

[55]  Bernard C. Levy,et al.  Robust least-squares estimation with a relative entropy constraint , 2004, IEEE Transactions on Information Theory.

[56]  S. Kotz,et al.  Symmetric Multivariate and Related Distributions , 1989 .

[57]  Yurii Nesterov,et al.  Complexity bounds for primal-dual methods minimizing the model of objective function , 2017, Mathematical Programming.

[58]  M. Knott,et al.  On the optimal mapping of distributions , 1984 .

[59]  Kok-Keong Tan,et al.  Existence of equilibrium for abstract economies , 1994 .

[60]  Sophia Decker,et al.  Approximate Methods In Optimization Problems , 2016 .

[61]  G. Simons,et al.  On the theory of elliptically contoured distributions , 1981 .

[62]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .

[63]  S. Poon,et al.  Financial Modeling Under Non-Gaussian Distributions , 2006 .

[64]  Mark W. Watson Introduction to econometrics. , 1968 .

[65]  Elad Hazan,et al.  Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets , 2014, ICML.

[66]  Luigi Malagò,et al.  Wasserstein Riemannian geometry of Gaussian densities , 2018, Information Geometry.