Detecting entropy increase in categorical data using maximum entropy distribution approximations

ABSTRACT We propose a statistical monitoring method to detect the increase of entropy in categorical data. First, we propose a distribution estimation method to approximate the probability distribution of the observed categorical data. The problem is formulated as a convex optimization problem, which involves finding the distribution that maximizes Shannon's entropy with the constraint defined by the given confidence intervals on possible distributions. Then we use this procedure to estimate the non-parametric, maximum entropy distribution of an observed data sample and use it for statistical monitoring based on a χ2-test statistic. This monitoring scheme was found to be effective in detecting entropy increases in the observed data based on various numerical studies and a real-world case study.

[1]  William H. Woodall,et al.  Methods for Monitoring Multiple Proportions When Inspecting Continuously , 2011 .

[2]  Armin Shmilovici,et al.  Context-Based Statistical Process Control , 2003, Technometrics.

[3]  Stelios Psarakis,et al.  Multivariate statistical process control charts: an overview , 2007, Qual. Reliab. Eng. Int..

[4]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[5]  Stergios B. Fotopoulos,et al.  All of Nonparametric Statistics , 2007, Technometrics.

[6]  Emmanuel Yashchin,et al.  On Detection of Changes in Categorical Data , 2012 .

[7]  Evgueni A. Haroutunian,et al.  Information Theory and Statistics , 2011, International Encyclopedia of Statistical Science.

[8]  Rafail V. Abramov,et al.  The multidimensional moment-constrained maximum entropy problem: A BFGS algorithm with constraint scaling , 2009, J. Comput. Phys..

[9]  Wei Jiang,et al.  A context tree method for multistage fault detection and isolation with applications to commercial video broadcasting systems , 2009 .

[10]  Stelios Psarakis,et al.  SPC Procedures for Monitoring Autocorrelated Processes , 2007 .

[11]  William H. Woodall,et al.  Control Charts Based on Attribute Data: Bibliography and Review , 1997 .

[12]  Fugee Tsung,et al.  Multivariate binomial/multinomial control chart , 2014 .

[13]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[14]  Charles P. Schmidt,et al.  Sensitivity Analysis of Additive Multiattribute Value Models , 1988, Oper. Res..

[15]  Ximing Wu,et al.  Calculation of Maximum Entropy Densities with Application to Income Distribtuions , 2002 .

[16]  Christian H. Weiß,et al.  Continuously Monitoring Categorical Processes , 2012 .

[17]  Aamir Saghir,et al.  Control Charts for Dispersed Count Data: An Overview , 2015, Qual. Reliab. Eng. Int..

[18]  Peihua Qiu,et al.  On Nonparametric Statistical Process Control of Univariate Processes , 2011, Technometrics.

[19]  J. Hoch Maximum entropy signal processing of two-dimensional NMR data , 1985 .

[20]  R Collins,et al.  Maximum entropy histograms , 1977 .

[21]  M. Marcucci MONITORING MULTINOMIAL PROCESSES , 1985 .

[22]  Irad Ben-Gal,et al.  Statistical process control via context modeling of finite-state processes: an application to production monitoring , 2004 .

[23]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[24]  Shiyu Zhou,et al.  Statistical process monitoring based on maximum entropy density approximation and level set principle , 2015 .

[25]  Zhen He,et al.  LASSO-based diagnosis scheme for multistage processes with binary data , 2014, Comput. Ind. Eng..

[26]  Nader Ebrahimi,et al.  Information theoretic framework for process control , 1998, Eur. J. Oper. Res..