Multiple window discrete scan statistic for higher-order Markovian sequences

Accurate and efficient methods to detect unusual clusters of abnormal activity are needed in many fields such as medicine and business. Often the size of clusters is unknown; hence, multiple (variable) window scan statistics are used to identify clusters using a set of different potential cluster sizes. We give an efficient method to compute the exact distribution of multiple window discrete scan statistics for higher-order, multi-state Markovian sequences. We define a Markov chain to efficiently keep track of probabilities needed to compute p-values for the statistic. The state space of the Markov chain is set up by a criterion developed to identify strings that are associated with observing the specified values of the statistic. Using our algorithm, we identify cases where the available approximations do not perform well. We demonstrate our methods by detecting unusual clusters of made free throw shots by National Basketball Association players during the 2009–2010 regular season.

[1]  H. Tong Determination of the order of a Markov chain by Akaike's information criterion , 1975, Journal of Applied Probability.

[2]  Andrew W. Moore,et al.  A Bayesian Spatial Scan Statistic , 2005, NIPS.

[3]  David A. Wagner,et al.  A Generalized Birthday Problem , 2002, CRYPTO.

[4]  D. E. K. Martin,et al.  p-values for the Discrete Scan Statistic through Slack Variables , 2015, Commun. Stat. Simul. Comput..

[5]  J. Naus The Distribution of the Size of the Maximum Cluster of Points on a Line , 1965 .

[6]  Joseph Naus,et al.  Some Probabilities, Expectations and Variances for the Size of Largest Clusters and Smallest Intervals , 1966 .

[7]  S. Wallenstein,et al.  Probabilities for the Size of Largest Clusters and Smallest Intervals , 1974 .

[8]  David J. Hunter An upper bound for the probability of a union , 1976, Journal of Applied Probability.

[9]  Joseph Glaz,et al.  Discrete, Continuous and Conditional Multiple Window Scan Statistics , 2013, J. Appl. Probab..

[10]  J. Fu,et al.  Distribution of the scan statistic for a sequence of bistate trials , 2001, Journal of Applied Probability.

[11]  Joseph Naus Probabilities for a Generalized Birthday Problem , 1974 .

[12]  M. Ebneshahrashoob,et al.  An Efficient Algorithm for Exact Distribution of Discrete Scan Statistics , 2005 .

[13]  J. Naus,et al.  Scan Statistics , 2014, Encyclopedia of Social Network Analysis and Mining.

[14]  A Simpler Expression for $K$th Nearest Neighbor Coincidence Probabilities , 1975 .

[15]  M. Kulldorff A spatial scan statistic , 1997 .

[16]  Vladimir Pozdnyakov,et al.  Scan Statistics: Methods and Applications , 2009 .

[17]  Joseph Naus,et al.  Approximations for Distributions of Scan Statistics , 1982 .

[18]  M Kulldorff,et al.  Spatial disease clusters: detection and inference. , 1995, Statistics in medicine.

[19]  Joseph Glaz,et al.  Multiple Window Discrete Scan Statistics , 2004 .

[20]  Tonglin Zhang,et al.  Spatial Scan Statistics , 2013 .

[21]  Joseph Naus,et al.  Multiple Window and Cluster Size Scan Procedures , 2004 .

[22]  Narayanaswamy Balakrishnan,et al.  Scan Statistics and Applications , 2012 .