Deriving Software Usage Patterns from Log Files

Log files (discrete recordings of user actions during software use) offer the ability to collect human-computer interaction data on a number of users, over time, while the users are engaged in typical tasks in typical environments. The disadvantage of log files is the lack of automated methods for analyzing the volumes of data in a meaningful way. This paper presents a log file analysis tool, Hawk, and discusses the characteristics which make it useful for this task. A particular analysis technique, based on Markov chain analysis, is described which can be used to derive high-level software usage patterns. A study of student interactions with a programming environment are used for examples of the use of the the tool and the technique. Log files (that is, discrete recording of user actions during software use) have several characteristics which make them ideal for research on the design of user interfaces and on the interactions between humans and computers. They can be used to collect data on any number of users over time, during each and every use of the software. Particularly important is that log files can be used to collect data while users are working on typical tasks in typical environments. More traditional methods for gathering data on human-computer interactions, such as think-aloud protocols, require unusual settings that can confound the analyses. The disadvantage of log file data is that there are few analysis techniques. Log file data tends to be voluminous, and often at too low a level to be of much use without some aggregation (e.g., individual keystrokes), suggesting the need for automated analyses. Researchers in hypertext and hypermedia have developed techniques and tools for using log files to trace access and then to develop an assessment of student concept formation in terms of information accessed (e.g., Log file analysis techniques for exploring user interactions with other forms of software are more rare. Hammer and Rouse (1979) used Markov chain analysis (a standard analysis tool, to study keystrokes in editing text files, but found the technique to be too sensitive to individual differences. Winne and Gupta (1993) have identified a number of powerful measures to use in evaluating log files of students using a computer-based study aid. The techniques of Winne and Gupta are based on data and graph theory, and they