Towards Metrics and Visualizations Sensitive to Coevolutionary Failures

The task of monitoring success and failure in coevolution is inherently difficult, as domains need not have any external metric to measure performance. Past metrics and visualizations for coevolution have been limited to identification and measurement of success but not failure. We suggest circumventing this limitation by switching from “best-of-generation”-based techniques to “all-of-generation”-based techniques. Using “all-ofgeneration” data, we demonstrate one such techique – a population-differential technique – that allows us to profile and distinguish an assortment of coevolutionary successes and failures, including arms-race dynamics, disengagement, cycling, forgetting, and relativism. Introduction Coevolution requires no domain-specific notion of objective fitness, enabling coevolutionary algorithms to learn in domains for which no objective metric is known or for which known metrics are too expensive. But this benefit comes at the expense of accountability, as there is consequently no external metric with which to measure an algorithm’s performance. Responses to this feedback void have come in the form of propositions for dynamics-based progress metrics. The most frequently used metrics have all been based upon Current Individual versus Ancestral Opponent (CIAO) plots (Cliff & Miller 1995). The CIAO plot offers useful feedback on performance, but does have limitations. In this paper, we focus primarily on one such limitation: the inability to provide feedback on coevolutionary failures. We present a related alternative, which could be called a Current Population versus Ancestral Opponent (CPAO) plot. We then offer one metric based upon the data in this plot. One inherent difficulty in proposing metrics for processes lacking objective fitness valuations is that such metrics cannot be proved accurate. In order to address this, we examine a simple coevolutionary domain that is measurable, and look for corroborating results. The simple domains used for algorithmic auditing in this paper are the Numbers Games, introduced in (Watson & Pollack 2001). This is particularly relevant, as that work focused on addressing counter-productive behaviors in coevolutionary systems, often responsible for Copyright c © 2005, American Association for Artificial Intelligence (www.aaai.org). All rights reserved. the failures for which we are interested in acquiring feedback. The final example presented uses a domain – RockPaper-Scissors – that possesses no objective metric. While the results drawn cannot be corroborated by such a metric, they are consistent with the cyclic nature of the game. We present this set of examples after reviewing existing CIAObased techniques and presenting a CPAO-based alternative. Best-of-Generation Techniques Analysis based on generation tables was first proposed in coevolution by (Cliff & Miller 1995) based on CIAO data, and this work has subsequently been explored and built upon in several ways, including the Masters Tournament (Floreano & Nolfi 1997), the Dominance Tournament (Stanley & Miikkulainen 2002), and the Hall of Fame (Rosin & Belew 1997). In two-population coevolution, a generation table assigns the table rows to the first population’s sequence of generations, and assigns table columns to successive generations of the second population. Internal table entries contain the results of evaluating the combination of the corresponding row and column generations. For data visualization, Cliff and Miller turn their tables into bitmap images (one pixel per table entry), and this paper employs a slightly modified version of that pixel-per-entry approach.1 This organization of data is valuable in making apparent the Red Queen effect: values drawn from evaluations along the table’s diagonal2 are simply incomparable to one another. Graphs displaying this instantaneous fitness over time are excellent illustrations of the Red Queen effect (see Fig. 1.) Generation table values are only comparable if either the candidate or the test is kept constant. For example, if one knows how a candidate at time t performs against some test T and one knows how the candidate at time t + 1 performs against that same test, comparing the results may provide an indication of progress over time. If the second candidate were evaluated against something other than T , however, the comparison of results could no longer claim to be a valid inThe figures included in this work are oriented differently from Cliff and Miller, however. In this paper, the initial generation is placed in the upper-left. Additionally, the data for the entire generation table is calculated here. Specifically, the diagonal for which the i candidate generation is evaluated using the i test generation.