Universal Switching Linear Least Squares Prediction

In this paper, we consider sequential regression of individual sequences under the square-error loss. We focus on the class of switching linear predictors that can segment a given individual sequence into an arbitrary number of blocks within each of which a fixed linear regressor is applied. Using a competitive algorithm framework, we construct sequential algorithms that are competitive with the best linear regression algorithms for any segmenting of the data as well as the best partitioning of the data into any fixed number of segments, where both the segmenting of the data and the linear predictors within each segment can be tuned to the underlying individual sequence. The algorithms do not require knowledge of the data length or the number of piecewise linear segments used by the members of the competing class, yet can achieve the performance of the best member that can choose both the partitioning of the sequence as well as the best regressor within each segment. We use a transition diagram (F. M. J. Willems, 1996) to compete with an exponential number of algorithms in the class, using complexity that is linear in the data length. The regret with respect to the best member is O(ln(n)) per transition for not knowing the best transition times and O(ln(n)) for not knowing the best regressor within each segment, where n is the data length. We construct lower bounds on the performance of any sequential algorithm, demonstrating a form of min-max optimality under certain settings. We also consider the case where the members are restricted to choose the best algorithm in each segment from a finite collection of candidate algorithms. Performance on synthetic and real data are given along with a Matlab implementation of the universal switching linear predictor.

[1]  M. Effros,et al.  Universal quantization of parametric sources has redundancy k/2 logn/n , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[2]  Jacob Ziv,et al.  Coding of sources with unknown statistics-I: Probability of encoding error , 1972, IEEE Trans. Inf. Theory.

[3]  Vladimir Vovk,et al.  Competitive On-line Linear Regression , 1997, NIPS.

[4]  P. Algoet UNIVERSAL SCHEMES FOR PREDICTION, GAMBLING AND PORTFOLIO SELECTION' , 1992 .

[5]  Michelle Effros,et al.  Rates of convergence in adaptive universal vector quantization , 1994, Proceedings of 1994 IEEE International Symposium on Information Theory.

[6]  Jorma Rissanen,et al.  Universal coding, information, prediction, and estimation , 1984, IEEE Trans. Inf. Theory.

[7]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[8]  Neri Merhav,et al.  On the minimum description length principle for sources with piecewise constant parameters , 1993, IEEE Trans. Inf. Theory.

[9]  Meir Feder,et al.  On universal quantization by randomized uniform/lattice quantizers , 1992, IEEE Trans. Inf. Theory.

[10]  Frans M. J. Willems,et al.  Coding for a binary independent piecewise-identically-distributed source , 1996, IEEE Trans. Inf. Theory.

[11]  Tamás Linder,et al.  A zero-delay sequential scheme for lossy coding of individual sequences , 2001, IEEE Trans. Inf. Theory.

[12]  Andrew C. Singer,et al.  Universal Context Tree Least Squares Prediction , 2006, 2006 IEEE International Symposium on Information Theory.

[13]  Yoram Singer,et al.  Switching Portfolios , 1998, Int. J. Neural Syst..

[14]  David Haussler,et al.  How to use expert advice , 1993, STOC.

[15]  Lee D. Davisson,et al.  Universal noiseless coding , 1973, IEEE Trans. Inf. Theory.

[16]  Andrew C. Singer,et al.  Universal linear least squares prediction: Upper and lower bounds , 2002, IEEE Trans. Inf. Theory.

[17]  Peter Auer,et al.  Tracking the Best Disjunction , 1998, Machine Learning.

[18]  Erik Ordentlich,et al.  Universal portfolios with side information , 1996, IEEE Trans. Inf. Theory.

[19]  Peter Elias,et al.  Universal codeword sets and representations of the integers , 1975, IEEE Trans. Inf. Theory.

[20]  David Luengo,et al.  Universal piecewise linear least squares prediction , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[21]  JORMA RISSANEN,et al.  A universal data compression system , 1983, IEEE Trans. Inf. Theory.

[22]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[23]  Tamás Linder,et al.  Tracking the Best of Many Experts , 2005, COLT.

[24]  Raphail E. Krichevsky,et al.  The performance of universal encoding , 1981, IEEE Trans. Inf. Theory.

[25]  Manfred K. Warmuth,et al.  Tracking a Small Set of Experts by Mixing Past Posteriors , 2003, J. Mach. Learn. Res..

[26]  David Haussler,et al.  Sequential Prediction of Individual Sequences Under General Loss Functions , 1998, IEEE Trans. Inf. Theory.

[27]  Neri Merhav,et al.  Universal prediction of individual sequences , 1992, IEEE Trans. Inf. Theory.

[28]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[29]  Georg Zeitler,et al.  Universal Piecewise Linear Prediction Via Context Trees , 2007, IEEE Transactions on Signal Processing.

[30]  Manfred K. Warmuth,et al.  Additive versus exponentiated gradient updates for linear prediction , 1995, STOC '95.

[31]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[32]  Vladimir Vovk,et al.  Derandomizing Stochastic Prediction Strategies , 1997, COLT '97.

[33]  Mark Herbster,et al.  Tracking the best regressor , 1998, COLT' 98.

[34]  David Haussler,et al.  Tight worst-case loss bounds for predicting with expert advice , 1994, EuroCOLT.

[35]  Glen G. Langdon,et al.  Universal modeling and coding , 1981, IEEE Trans. Inf. Theory.

[36]  Andrew C. Singer,et al.  Universal linear prediction by model order weighting , 1999, IEEE Trans. Signal Process..

[37]  Neri Merhav,et al.  Low-complexity sequential lossless coding for piecewise-stationary memoryless sources , 1998, IEEE Trans. Inf. Theory.

[38]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[39]  Vladimir Vovk,et al.  A game of prediction with expert advice , 1995, COLT '95.

[40]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[41]  Neri Merhav,et al.  Universal Prediction , 1998, IEEE Trans. Inf. Theory.

[42]  Dean Phillips Foster Prediction in the Worst Case , 1991 .

[43]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[44]  Philip Wolfe,et al.  Contributions to the theory of games , 1953 .

[45]  Suleyman S. Kozat Competitive Signal Processing , 2004 .