Why testing autonomous agents is hard and what can be done about it

Model Shallow Scope ➌ −→ Model Checking, ➋ ↑ Proof Concrete Testing ➊ −→ Systematic Enumeration Figure 2: Approaches to Assurance: A two-dimensional taxonomy that each compensates for the other’s weaknesses, whilst retaining its strengths. Abstractly, the idea of combining different techniques to obtain assurance of a system can be seen as a form of safety cases [1], where it has been argued that the assurance of a system should be done by providing direct evidence linking a system with the required properties, expressed in terms of the “real world” [8, 16]. There have been others who have argued for the combination of testing and proving (e.g. [10, 15, 5, 23, 13]). Space precludes a detailed discussion of these approaches, but we note that none provide a detailed and clear way of combining testing and proving in the way that is required. In the remainder of this section, we sketch an approach for combining testing and formal verification. The key idea of the approach is that in order to link testing and formal verification we build a “bridge” using intermediate approaches. The “bridge” consists of a number of “steps” where we apply two approaches that differ in only one aspect. For example, given two different models (e.g. an abstract model and an implementation), we would subject them to the same assurance approach, in order to look for differences between the models. Since we use the same assurance approach, we can conclude that any difference found is actually due to the differences between the models. Alternatively, given the same model, we might subject it to two different assurance techniques. In order for this approach to work, we need to define a number of assurance techniques that are “intermediate”, i.e. are between formal verification and testing. Furthermore, these approaches need to provide a one-step-at-a-time “staircase” that links formal verification and testing. We define these intermediate techniques using a two-dimensional taxonomy (see Figure 2, ignore the arrows for now). The first dimension is abstraction: we distinguish between executable models (“concrete”) and non-executable (“abstract”) models. The second dimension is the coverage of the method: is it covering only particular selected points in the space of inputs/behaviours? (“individual test cases”); all points within a sub-space? (“incomplete systematic exploration”); or the entire space? (“complete systematic exploration”). An example of how we might use these intermediate assurance techniques to build a stepwise bridge between testing and formal verification is the following (the numbers correspond to the numbered arrows in Figure 2): ➊ We subject the implementation to testing (with selected test cases) and systematic enumeration (of a subset of the input or behaviour space), i.e. same model, different techniques. If we find the same issues with both approaches, then this gives us some confidence that the selected test cases provide good coverage. If the systematic enumeration finds additional issues, then this is evidence that the test cases are insufficient. On the other hand, if the selected tests cases find errors that the systematic enumeration misses then this is evidence that the scope within which enumeration is performed is too limited. ➋ We apply the same technique (systematic generation of test cases within a limited scope) with two different models (concrete and abstract). Finding the same issues in both cases gives us some confidence that the models are equivalent. ➌ We apply systematic generation within a limited scope and formal verification (complete systematic

[1]  Bojan Cukic,et al.  Combining complementary formal verification strategies to improve performance and accuracy , 2007 .

[2]  Michael Luck,et al.  Evolutionary testing of autonomous software agents , 2009, Autonomous Agents and Multi-Agent Systems.

[3]  Daniel Jackson,et al.  A direct path to dependable software , 2009, CACM.

[4]  Michael Winikoff,et al.  On the testability of BDI agents , 2010 .

[5]  M. Young,et al.  Rethinking the Taxonomy of Fault Detection Techniques , 1989, 11th International Conference on Software Engineering.

[6]  John Rushby,et al.  A Safety-Case Approach For Certifying Adaptive Systems , 2009 .

[7]  Michael Winikoff,et al.  Implementing commitment-based interactions , 2007, AAMAS '07.

[8]  Munindar P. Singh,et al.  An Architecture for Multiagent Systems An Approach Based on Commitments , 2009 .

[9]  Tim Menzies,et al.  On the Distribution of Property Violations in Formal Models: An Initial Study , 2006, 30th Annual International Computer Software and Applications Conference (COMPSAC'06).

[10]  Roberto A. Flores,et al.  Using a performative subsumption lattice to support commitment-based conversations , 2005, AAMAS '05.

[11]  Munindar P. Singh,et al.  Flexible protocol specification and execution: applying event calculus planning using commitments , 2002, AAMAS '02.

[12]  Munindar P. Singh,et al.  Multiagent commitment alignment , 2009, AAMAS.

[13]  Michael Winikoff,et al.  Assurance of Agent Systems: What Role should Formal Verification play? , 2010 .

[14]  Mark Harman,et al.  The Current State and Future of Search Based Software Engineering , 2007, Future of Software Engineering (FOSE '07).

[15]  Michael Winikoff,et al.  Implementing flexible and robust agent interactions using Distributed Commitment Machines , 2006, Multiagent Grid Syst..

[16]  Lori A. Clarke,et al.  Partition Analysis: A Method Combining Testing and Verification , 1985, IEEE Transactions on Software Engineering.

[17]  A. S. Roa,et al.  AgentSpeak(L): BDI agents speak out in a logical computable language , 1996 .

[18]  P. Bishop,et al.  The future of goal-based assurance cases , 2004 .

[19]  Michael R. Lowry,et al.  Towards a theory for integration of mathematical verification and empirical testing , 1998, Proceedings 13th IEEE International Conference on Automated Software Engineering (Cat. No.98EX239).

[20]  Michael Winikoff,et al.  Hermes: Designing Flexible and Robust Agent Interactions , 2009, Handbook of Research on Multi-Agent Systems.