To justify use of a simulation modeling framework in designing a human-machine system, engineers need to have some reasonable expectation that doing so will prove beneficial. For developers of modeling frameworks this presents two main challenges. First, the framework must be able to predict design-relevant aspects of human performance – i.e. it must be able to make predictions that designers care about. Second, the time and expertise required to use the framework must be kept within practical limits. This paper will describe efforts to address these problems using the APEX modeling framework. When designing complex devices such as integrated circuits and automobile engines, engineers routinely use computer simulation to predict how well the device would function if actually built. By helping to detect problems at an early stage in the design process, simulation postpones or eliminates the need for a physical prototype. Engineering costs decrease in numerous ways resulting in improved reliability, greater innovation, faster development time, and lower overall cost of development. While a routine part of the design process for some devices, simulation is hardly ever used to help design human-machine systems. There are two main reasons for this, each stemming from difficulty modeling the human components of these systems. First, available frameworks for modeling human performance often prove unsatisfactory because they are incomplete in some crucial way, or cannot adequately predict design-relevant aspects of human performance. Second, using such a framework requires spending a great deal of time and effort preparing application-specific elements of the simulation including models of newly designed devices and formal descriptions of “how-to” knowledge for operating in the domain of interest. The limitations of most human models often make it unlikely that this investment of time and effort will pay off. Despite these limitations, human operator models have occasionally proven effective in practical terms (John and Kieras, 1994). The most well-known example is project Ernestine (Gray et. al, 1993) which used the GOMS framework to predict how long, on average, a NYNEX telephone operator would require to handle a customer transaction using newly designed equipment and procedures. The model accurately predicted a time .63 seconds greater than that required with the old equipment. With each second of transaction time costing NYNEX three million dollars annually, purchasing this equipment would have been a costly mistake. The value-added from a modeling effort depends on how the modeling effort is integrated into the design process and on what evaluation methods would otherwise be employed. For instance, modeling can be used to direct limited empirical evaluation resources to likely problems, thereby increasing the likelihood that important problems will be detected. Alternately, it can be used to find problems that would otherwise be detected at a later stage in the design process when implementing a design fix is typically more expensive. To justify use of a simulation modeling framework in designing a human-machine system, engineers need to have some reasonable expectation that doing so will prove beneficial. For developers of modeling frameworks this presents two main challenges. First, the framework must be able to predict design-relevant aspects of human performance – i.e. it must be able to make predictions that designers care about. Second, the time and expertise required to use the framework must be kept within practical limits. This paper will describe efforts to address these problems using the APEX modeling framework. Design-relevant predictions To add value to a design-engineering process, a human-operator model must be able to predict aspects of human performance that are important to designers. Which aspects are relevant will vary depending on what constitutes good performance for the human-machine system as a whole. For example, in designing equipment for a telephone operator as described above, the important system performance variable was the average time required to complete a single transaction. Thus GOMS, whose traditional strength has been predicting completion time for brief, routine tasks, proved a useful and appropriate framework. Other frameworks have been designed to predict, e.g., how quickly skilled performance will emerge after learning a task (Newell, 1990), whether a task is likely to impose excessive workload (Corker and Smith, 1993), whether the anthropometric properties of an interface (e.g. reachability of controls) are human-compatible, and whether multiple operators are likely to cross-check one another’s behavior (MacMillan et al., 1997). APEX is a GOMS-like framework that incorporates mechanisms and methodologies for predicting certain forms of human error. The need for models that can predict error has often been mentioned in the literature on human modeling (Olson and Olson, 1989; Reason, 1990; John and Kieras, 1994), but little progress has been made incorporating prediction capabilities into a practically useful modeling framework (although see Kitajima and Polson, 1995). “..at this time, research on human errors is still far from providing more than the familiar rough guidelines concerning the prevention of user error. No prediction methodology, regardless of the theoretical approach, has yet been developed and recognized as satisfactory. (John and Kieras, 1994)” In light of the apparent absence of adequate scientific theories, it is worth considering what engineers currently do to prevent design-induced operator error. Current practices can be divided into those that are applied at a late stage in the design process, and those applicable at an early stage. At a late stage of the design process, engineers can test with live users on a physical prototype. This can be very effective but, like other late-stage techniques, also very expensive. At an early stage, designers must rely on a combination of informal “common sense” knowledge and explicit design guidelines found in any of a number of engineering handbooks (e.g. Smith and Mosier, 1986). Research on error prevention generally focuses on improving techniques for user testing or on adding to and refining design guidelines. With APEX, we have tried the unusual approach of attempting to enhance the contribution of common sense knowledge in the design process (Freed and Remington, 1998). In particular, people tend to apply their informal understanding of human error in an unsystematic way, causing them to overlook predictable problems. By incorporating this informal and relatively crude understanding in a model, APEX can apply that understanding systematically in a large number of simulated scenarios. Design problems that might otherwise have been overlooked, and then seem obvious from hindsight, could thus be predicted early in the design process. For example, consider the task of withdrawing cash from an automatic teller machine (ATM). Withdrawals from most current ATMs involve a sequence of actions that begin with the user inserting a magnetic card, and end with collecting the requested cash and then retrieving the card. A wellknown problem with the use of ATMs is the frequency with which users take their money but forget to retrieve their cards. Some newer ATMs avoid this problem by inverting the order of the final two steps, forcing users to retrieve their cards before the machine dispenses requested money. This way of compensating for human forgetfulness is what Donald Norman (1988) calls a “forcing function.” By placing an easily omitted task step on the critical path to achieving a goal, the designer drastically reduces omission likelihood. Failing to retrieve one’s bank card from an ATM is an example of a postcompletion error (Byrne and Bovair, 1997), a general class of error in which a person omits a subtask that arises in service of a main task but is not on the critical path to achieving the main task’s goal. These errors are common in all sorts of everyday tasks; other examples include failing to retrieve the original from a photocopier and failing to replace an automobile’s fuel cap after refueling. People with no training related to human factors or psychology recognize this pattern of error on the basis of common sense knowledge – i.e. their explanations of such errors resemble the definition given above. Despite some level of intuitive understanding, a number of factors make it easy for engineers to overlook these and other common forms of operator error. For instance, a system may be used in a wide variety of operating conditions, only a small fraction of which invite error. Similarly, a complex system may have many associated modes, configurations and functions, greatly widening the range of situations in which error may be inadvertantly facilitated by the design. The potentially vast range of operating conditions and system states makes it difficult for engineers to consider potential errors systematically. Computer simulation makes it possible to overcome this difficulty, essentially by brute force. Rather than selecting a few test cases and hoping they are representative and revealing, the engineer uses simulation to explore many cases. The approach to predicting errors in APEX, described elsewhere in detail (Freed, 1998a; Freed and Remington, 1998), consists mainly of two components. First, the model incorporates certain heuristic decision-making biases. Postcompletion errors, for example, emerge from a heuristic for ending ongoing tasks: a task is complete if the task’s main goal is satisfied. Such heuristics are generally good gambles, but prescribe incorrect behavior in some situations. Second, the model incorporates a number of mechanisms that suppress reliance on fallible heuristics. For example, mentally rehearsing an intention (e.g. to retrieve one’s ATM card) temporarily suppresses reliance on
[1]
John E. Laird,et al.
Towards the knowledge level in Soar: the role of the architecture in the use of knowledge
,
1993
.
[2]
Lawrence Birnbaum,et al.
Simulating human performance in complex, dynamic environments
,
1998
.
[3]
John W. Senders,et al.
The Human Operator as a Monitor and Controller of Multidegree of Freedom Systems
,
1964
.
[4]
John R. Anderson.
The Adaptive Character of Thought
,
1990
.
[5]
John R. Anderson,et al.
Rules of the Mind
,
1993
.
[6]
Kevin M. Corker,et al.
AN ARCHITECTURE AND MODEL FOR COGNITIVE ENGINEERING SIMULATION ANALYSIS: APPLICATION TO ADVANCED AVIATION AUTOMATION
,
1993
.
[7]
David E. Kieras,et al.
An Overview of the EPIC Architecture for Cognition and Performance With Application to Human-Computer Interaction
,
1997,
Hum. Comput. Interact..
[8]
Richard Reviewer-Granger.
Unified Theories of Cognition
,
1991,
Journal of Cognitive Neuroscience.
[9]
Gary M. Olson,et al.
The Growth of Cognitive Modeling in Human-Computer Interaction Since GOMS
,
1990,
Hum. Comput. Interact..
[10]
Michael Freed,et al.
A Conceptual Framework for Predicting Error in Complex Human-Machine Environments
,
1998
.
[11]
Muneo Kitajima,et al.
A comprehension-based model of correct performance and errors in skilled, display-based, human-computer interaction
,
1995,
Int. J. Hum. Comput. Stud..
[12]
Michael E. Atwood,et al.
Project Ernestine: Validating a GOMS Analysis for Predicting and Explaining Real-World Task Performance
,
1993,
Hum. Comput. Interact..
[13]
David E. Kieras,et al.
The GOMS Family of Analysis Techniques: Tools for Design and Evaluation
,
1994
.
[14]
Jean MacMillan,et al.
A Comparison of Alternatives for Automated Decision Support in a Multi-Task Environment
,
1997
.
[15]
D. Norman.
Categorization of action slips.
,
1981
.
[16]
Michael Freed,et al.
Managing Multiple Tasks in Complex, Dynamic Environments
,
1998,
AAAI/IAAI.
[17]
J. Shaoul.
Human Error
,
1973,
Nature.
[18]
Melvin D. Montemerlo,et al.
The Judgmental Nature of Task Analysis
,
1978
.
[19]
Michael D. Byrne,et al.
A Working Memory Model of a Common Procedural Error
,
1997,
Cogn. Sci..