Fast and Flexible Outbreak Detection by Linear-Time Subset Scanning

The spatial scan statistic [1] detects significant spatial clusters of disease by maximizing a likelihood ratio statistic over a large set of spatial regions. Typical spatial scan approaches either constrain the search regions to a given shape, reducing power to detect patterns that do not correspond to this shape, or perform a heuristic search over a larger set of irregular regions, in which case they may not find the most relevant clusters. In either case, computation time is a serious issue when searching over complex region shapes or when analyzing a large amount of data. An alternative approach might be to search over all possible subsets of the data to find the most relevant patterns, but since there are exponentially many subsets, an exhaustive search is computationally infeasible.