Predicting Human Performance Differences on Multiple Interface Alternatives: KLM, GOMS and CogTool are Unreliable

Abstract Cognitive modeling tools, such as KLM, GOMS and CogTool, can be used to predict human performance on interface designs before they are implemented and without the need for user testing. The model predictions can inform interface design, because they allow designers to quantitatively compare multiple interface alternatives. However, little research has been done to determine how accurately cognitive modeling tools can predict human performance differences on interface alternatives. It is also unclear whether different modeling tools produce practically significantly different results. The goal of this study was to evaluate the accuracy of KLM, GOMS and CogTool for predicting human performance differences on multiple interface alternatives. Three tasks on three interface alternatives were modeled using KLM, GOMS and CogTool. The model predictions of each tool were compared to performance data of 20 expert users performing the tasks on the interfaces. For all tasks and all modeling tools, the model-predicted trend did not correspond to the trend in the human performance data. For the six statistically significant differences between the interfaces, all tools predicted the direction of difference correctly in four cases, and incorrectly in two cases. The average difference between the predicted and the observed magnitude of difference between the interfaces was 5.49 s for KLM (range: 0.8 – 13.35), 3.98 s for GOMS (range: 0.8 – 9.75) and 3.49 s for CogTool (range: 0.13 – 10.65). These differences between the tools were not statistically significant. In conclusion, KLM, GOMS and CogTool cannot reliably predict human performance differences on multiple interface alternatives. Our results indicate that if the models predict faster performance on interface A than on interface B, humans actually perform faster on interface B than on interface A in one third of the cases. This raises questions about the validity of these cognitive modeling tools in interface design practice.