Adapting to Non-stationarity with Growing Expert Ensembles

When dealing with time series with complex non-stationarities, low retrospective regret on individual realizations is a more appropriate goal than low prospective risk in expectation. Online learning algorithms provide powerful guarantees of this form, and have often been proposed for use with non-stationary processes because of their ability to switch between different forecasters or ``experts''. However, existing methods assume that the set of experts whose forecasts are to be combined are all given at the start, which is not plausible when dealing with a genuinely historical or evolutionary system. We show how to modify the ``fixed shares'' algorithm for tracking the best expert to cope with a steadily growing set of experts, obtained by fitting new models to new data as it becomes available, and obtain regret bounds for the growing ensemble.

[1]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[2]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[3]  Elad Hazan,et al.  Extracting certainty from uncertainty: regret bounded by variation in costs , 2008, Machine Learning.

[4]  Sanjeev Arora,et al.  The Multiplicative Weights Update Method: a Meta-Algorithm and Applications , 2012, Theory Comput..

[5]  David N. DeJong,et al.  Introduction to Structural Macroeconometrics , 2007 .

[6]  Neri Merhav,et al.  Universal Prediction , 1998, IEEE Trans. Inf. Theory.

[7]  Michael P. Clements,et al.  Forecasting Non-Stationary Economic Time Series , 1999 .

[8]  Tommi S. Jaakkola,et al.  Online Learning of Non-stationary Sequences , 2003, NIPS.

[9]  S. Caires,et al.  On the Non-parametric Prediction of Conditionally Stationary Sequences , 2005 .

[10]  Manfred K. Warmuth,et al.  The weighted majority algorithm , 1989, 30th Annual Symposium on Foundations of Computer Science.

[11]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[12]  Benjamin Weiss,et al.  How Sampling Reveals a Process , 1990 .

[13]  P. Algoet UNIVERSAL SCHEMES FOR PREDICTION, GAMBLING AND PORTFOLIO SELECTION' , 1992 .

[14]  Marcus A. Maloof,et al.  Using additive expert ensembles to cope with concept drift , 2005, ICML.

[15]  Robert H. Shumway,et al.  Time Series Analysis and Its Applications (Springer Texts in Statistics) , 2005 .

[16]  Roummel F. Marcia,et al.  Sequential Anomaly Detection in the Presence of Noise and Limited Feedback , 2009, IEEE Transactions on Information Theory.

[17]  Ralf Klinkenberg,et al.  An Ensemble Classifier for Drifting Concepts , 2005 .

[18]  Elad Hazan,et al.  learning algorithms for changing environments , 2009 .

[19]  Robert L. Paige,et al.  The Hodrick-Prescott Filter A Special Case of Penalized Spline Smoothing , 2010 .

[20]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[21]  Vladimir Vovk,et al.  Aggregating strategies , 1990, COLT '90.

[22]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine-mediated learning.

[23]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[24]  David S. Stoffer,et al.  Time series analysis and its applications , 2000 .