Model Change Detection With the MDL Principle

We are concerned with the issue of detecting model changes in probability distributions. We specifically consider the strategies based on the minimum description length (MDL) principle. We theoretically analyze their basic performance from the two aspects: data compression and hypothesis testing. From the view of data compression, we derive a new bound on the minimax regret for model changes. Here, the mini–max regret is defined as the minimum of the worst-case code-length relative to the least normalized maximum likelihood code-length over all model changes. From the view of hypothesis testing, we reduce the model change detection into a simple hypothesis testing problem. We thereby derive upper bounds on error probabilities for the MDL-based model change test. The error probabilities are valid for finite sample size and are related to the information-theoretic complexity as well as the discrepancy measure of the hypotheses to be tested.

[1]  Tamás Linder,et al.  Efficient Tracking of Large Classes of Experts , 2011, IEEE Transactions on Information Theory.

[2]  Kenji Yamanishi,et al.  Detecting gradual changes from data stream using MDL-change statistics , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[3]  Kenji Yamanishi,et al.  Detecting changes in streaming data with information-theoretic windowing , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[4]  Jorma Rissanen,et al.  Information and Complexity in Statistical Modeling , 2006, ITW.

[5]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[6]  Wouter M. Koolen,et al.  Combining Expert Advice Efficiently , 2008, COLT.

[7]  Kenji Yamanishi,et al.  Dynamic syslog mining for network failure monitoring , 2005, KDD '05.

[8]  Neri Merhav,et al.  Low-complexity sequential lossless coding for piecewise-stationary memoryless sources , 1998, IEEE Trans. Inf. Theory.

[9]  H. Akaike A new look at the statistical model identification , 1974 .

[10]  Žliobait . e,et al.  Learning under Concept Drift: an Overview , 2010 .

[11]  Kenji Yamanishi,et al.  Sequential network change detection with its applications to ad impact relation analysis , 2012, 2012 IEEE 12th International Conference on Data Mining.

[12]  Ryota Tomioka,et al.  Discovering Emerging Topics in Social Streams via Link-Anomaly Detection , 2014, IEEE Transactions on Knowledge and Data Engineering.

[13]  Steven de Rooij,et al.  Learning the Switching Rate by Discretising Bernoulli Sources Online , 2009, AISTATS.

[14]  P. Fearnhead,et al.  On‐line inference for multiple changepoint problems , 2007 .

[15]  O. Bousquet A Note on Parameter Tuning for On-Line Shifting Algorithms , 2003 .

[16]  Frans M. J. Willems,et al.  Coding for a binary independent piecewise-identically-distributed source , 1996, IEEE Trans. Inf. Theory.

[17]  David V. Hinkley,et al.  Inference about the change-point in a sequence of binomial variables , 1970 .

[18]  Seshadhri Comandur,et al.  Efficient learning algorithms for changing environments , 2009, ICML '09.

[19]  Kenji Yamanishi,et al.  Rank Selection for Non-negative Matrix Factorization with Normalized Maximum Likelihood Coding , 2016, SDM.

[20]  Jon M. Kleinberg,et al.  Bursty and Hierarchical Structure in Streams , 2002, Data Mining and Knowledge Discovery.

[21]  Neri Merhav,et al.  On the minimum description length principle for sources with piecewise constant parameters , 1993, IEEE Trans. Inf. Theory.

[22]  Kenji Yamanishi,et al.  Detecting changes of clustering structures using normalized maximum likelihood coding , 2012, KDD.

[23]  Kenji Yamanishi,et al.  Dynamic Model Selection With its Applications to Novelty Detection , 2007, IEEE Transactions on Information Theory.

[24]  Daniel J. Costello,et al.  Asymptotically optimal low-complexity sequential lossless coding for piecewise-stationary memoryless sources - Part 1: The regular case , 2000, IEEE Trans. Inf. Theory.

[25]  Jorma Rissanen,et al.  Optimal Estimation of Parameters , 2012 .

[26]  Mark Herbster,et al.  Tracking the Best Expert , 1995, Machine Learning.

[27]  Jorma Rissanen,et al.  Minimum Description Length Principle , 2010, Encyclopedia of Machine Learning.

[28]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[29]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[30]  P. Grünwald,et al.  Catching up faster by switching sooner: a predictive approach to adaptive estimation with an application to the AIC–BIC dilemma , 2012 .

[31]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[32]  S. Panchapakesan,et al.  Inference about the Change-Point in a Sequence of Random Variables: A Selection Approach , 1988 .

[33]  Manfred K. Warmuth,et al.  The Last-Step Minimax Algorithm , 2000, ALT.

[34]  Frans M. J. Willems,et al.  Switching between two universal source coding algorithms , 1998, Proceedings DCC '98 Data Compression Conference (Cat. No.98TB100225).

[35]  Seyed Alireza Razavi,et al.  AR order selection in the case when the model parameters are estimated by forgetting factor least-squares algorithms , 2010, Signal Process..

[36]  Arno Siebes,et al.  StreamKrimp: Detecting Change in Data Streams , 2008, ECML/PKDD.

[37]  Frans M. J. Willems,et al.  Weighted coding methods for binary piecewise memoryless sources , 1995, Proceedings of 1995 IEEE International Symposium on Information Theory.

[38]  Vladimir Vovk,et al.  Derandomizing Stochastic Prediction Strategies , 1997, COLT '97.