A Unified Approach to Universal Prediction: Generalized Upper and Lower Bounds

We study sequential prediction of real-valued, arbitrary, and unknown sequences under the squared error loss as well as the best parametric predictor out of a large, continuous class of predictors. Inspired by recent results from computational learning theory, we refrain from any statistical assumptions and define the performance with respect to the class of general parametric predictors. In particular, we present generic lower and upper bounds on this relative performance by transforming the prediction task into a parameter learning problem. We first introduce the lower bounds on this relative performance in the mixture of experts framework, where we show that for any sequential algorithm, there always exists a sequence for which the performance of the sequential algorithm is lower bounded by zero. We then introduce a sequential learning algorithm to predict such arbitrary and unknown sequences, and calculate upper bounds on its total squared prediction error for every bounded sequence. We further show that in some scenarios, we achieve matching lower and upper bounds, demonstrating that our algorithms are optimal in a strong minimax sense such that their performances cannot be improved further. As an interesting result, we also prove that for the worst case scenario, the performance of randomized output algorithms can be achieved by sequential algorithms so that randomized output algorithms do not improve the performance.

[1]  V. Vovk Competitive On‐line Statistics , 2001 .

[2]  Tsachy Weissman,et al.  Universal FIR MMSE Filtering , 2009, IEEE Transactions on Signal Processing.

[3]  Henry Stark,et al.  Probability, Random Processes, and Estimation Theory for Engineers , 1995 .

[4]  Tsachy Weissman,et al.  Universal prediction of individual binary sequences in the presence of noise , 2001, IEEE Trans. Inf. Theory.

[5]  Adam Krzyzak,et al.  Nonparametric estimation and classification using radial basis function nets and empirical risk minimization , 1996, IEEE Trans. Neural Networks.

[6]  Andrew C. Singer,et al.  Universal linear prediction by model order weighting , 1999, IEEE Trans. Signal Process..

[7]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[8]  Tsachy Weissman,et al.  Competitive On-line Linear FIR MMSE Filtering , 2007, 2007 IEEE International Symposium on Information Theory.

[9]  Philip M. Long,et al.  Worst-case quadratic loss bounds for prediction using linear functions and gradient descent , 1996, IEEE Trans. Neural Networks.

[10]  V. J. O H N M A T H Adaptive Polynomial Filters , 2022 .

[11]  Andrew C. Singer,et al.  Universal linear least squares prediction: Upper and lower bounds , 2002, IEEE Trans. Inf. Theory.

[12]  Georg Zeitler,et al.  Universal Piecewise Linear Prediction Via Context Trees , 2007, IEEE Transactions on Signal Processing.

[13]  Narasimhan Sundararajan,et al.  A Fast and Accurate Online Sequential Learning Algorithm for Feedforward Networks , 2006, IEEE Transactions on Neural Networks.

[14]  Adam Krzyzak,et al.  Radial Basis Function Networks and Complexity Regularization in Function Learning , 2022 .

[15]  Aarnout Brombacher,et al.  Probability... , 2009, Qual. Reliab. Eng. Int..

[16]  Vladimir Cherkassky,et al.  Model complexity control for regression using VC generalization bounds , 1999, IEEE Trans. Neural Networks.

[17]  Andrew C. Singer,et al.  Universal Linear Least-Squares Prediction in the Presence of Noise , 2007, 2007 IEEE/SP 14th Workshop on Statistical Signal Processing.