Change detection under autocorrelation

Change detection under autocorrelation Maarten Speekenbrink (m.speekenbrink@ucl.ac.uk), Matthew A. Twyman (m.twyman@ucl.ac.uk) Nigel Harvey (n.harvey@ucl.ac.uk) Cognitive, Perceptual and Brain Sciences, University College London Gower Street, London WC1E 6BT, England Abstract Judgmental detection of changes in time series is an ubiqui- tous task. Previous research has shown that human observers are often relatively poor at detecting change, especially when the series are serially dependent (autocorrelated). We present two experiments in which participants were asked to judge the occurrence of changes in time series with varying levels of au- tocorrelation. Results show that autocorrelation increases the difficulty of discriminating change from no change, and that observers respond to this increased difficulty by biasing their decisions towards change. This results in increased false alarm rates, while leaving hit rates relatively intact. We present a ra- tional (Bayesian) model of change detection and compare it to two heuristic models that ignore autocorrelation in the series. Participants appeared to rely on a simple heuristic, where they first visually match a change function to a series, and then de- termine whether the putative change exceeds the variability in the data. Keywords: change detection; judgment; forecasting Introduction Detecting changes in time series is a surprisingly ubiquitous task. Doctors and therapists monitor diagnostic indicators for signs of disease onset and for evidence that a prescribed treat- ment is effective; farmers monitor soil conditions to decide whether additional irrigation is necessary; local authorities monitor river levels for increased likelihood of flooding; pro- bation officers monitor probationers’ behaviour for evidence of return to crime; financiers monitor data, such as exchange rates, for signs of trend reversal. Many other examples could be given. As with forecasting and control tasks, monitoring tasks may be tackled by formal statistical methods, by using judgment alone, or by using some combination of these two approaches. The method most favoured depends to a large ex- tent on the domain. Typically, implementation of and training in formal methods consume more resources (time, money, ef- fort) but the investment may be worthwhile if those methods have considerable benefits over judgment in terms of accu- racy. Thus, it would be useful to know just how good human judgment is relative to formal methods. There are many formal statistical methods for detecting change in time series (e.g., Albert & Chib, 1993; Carlin, Gelfand, & Smith, 1992; Hamilton, 1990). This variety is partly because some approaches represent the event produc- ing the regime change as deterministic whereas others repre- sent it as a random variable and partly because, whichever of these approaches is adopted, there is still some debate about how best to estimate the likelihood that a change has occurred. In contrast, there has been very little research into judg- mental assessment of regime change. Originally, behavioural psychologists working within the Skinnerian tradition used judgment (visual inference) to assess whether a manipulation changed some aspect of an animal’s behaviour represented as a time series. They argued that this is a conservative ap- proach because only large effects can be detected (e.g., Baer, 1977). Their claims were not directly tested. However, when behaviour analysts later used the same approach to assess hu- man patients, there was concern that the shorter pre-treatment baselines in the series impaired visual inference. As a result, some experiments were carried out to investigate how accu- rately people can detect change. Judgmental change detection and autocorrelation Jones, Weinrott, and Vaught (1978) found that people were poor at detecting change in real series: inter-rater reliability of judgments was low at .39 and average miss and false alarm rates were 48% and 33%, respectively. Sequential depen- dence (autocorrelation) in series increased false alarm rates. This study used interrupted time series analysis as the gold standard for establishing whether there was a real change in the series. However, series were so short that this statistical approach would have lacked power. People may have been able to detect changes that the statistical analysis could not: if so, their performance may not have been as bad as it appeared to be. To circumvent this problem, Matyas and Greenwood (1990) simulated series with known levels of random noise and first-order autocorrelation. However, they still found that false alarm rates (typically over 40%) were much higher than miss rates (typically about 10%), especially when data were autocorrelated. They concluded that judgment is not as con- servative as behaviour analysts assumed. The increase in false alarm rates under positive autocorre- lation is problematic. In single-subject research, where visual assessment of change is still the dominant method (Brossart, Parker, Olson, & Mahadevan, 2006), there is positive autocor- relation in the large majority of series (Busk & Marascuilo, 1988). Why does autocorrelation impair change detection? Consider a time series y 1:T = (y 1 , . . . , y T ) which follows an r-th order autoregressive process r y t = µ t + ∑ α k (Y t−k − µ t−k ) + e t e t ∼ N(0, σ 2 e ) k=1 This process implies a serial dependence between successive time points such that when a previous value y t−k is above the mean µ t−k , a later value y t is more likely to also be above the mean (for α k > 0, positive autocorrelation), or more likely to be below the mean (α k < 0, i.e., negative autocorrelation).