Induction of Prototypes in a Robotic Setting Using Local Search MDL

Induction of Prototypes in a Robotic Setting using Local Search MDL Gregory M. Kobele ∗ Jason Riggle ∗ Richard Brooks † David Friedlander ‡ Charles Taylor § Edward Stabler ∗ Abstract Categorizing objects sets the stage for more ad- vanced interactions with the environment. Minimum Description Length learning provides a framework in which to investigate processes by which concept learn- ing might take place. Importantly, the concepts so ac- quired can be viewed as having a prototype structure - the concepts may apply to one object better than to another. We ground our discussion in a real-world setting - objects to categorize are sensor readings of the behaviours of two mobile robots. sification of simple robotic behaviour [1], as well as to language evolution [4, 5]. In these previous stud- ies concepts (or languages) were represented as deter- ministic finite state automata, and percepts were fi- nite strings over a fixed finite alphabet. Our current study extends the previous ones in two ways. First, we enrich the hypothesis space of our agents to include non-deterministic automata. Although deterministic and non-deterministic machines are equivalent in ex- pressive power, enriching the agent’s hypothesis space in this manner has repercussions for the results of the learning process. Second, we exploit a particu- lar property of the MDL setup, yielding a cognitively interesting prototype structure of concepts which al- lows for degrees of membership in a category. One novel aspect of the prototype structure presented here is that the degrees of category membership are a holis- tic and emergent property of the conceptual space. That is, whether a perception is a “good example” of a particular category depends on what other cate- gories are available. In our setting, concepts are repre- sented as finite automata, and perceptions as strings over a fixed finite alphabet. Given a space of con- cepts C, a perception p is a good example of a cate- gory C ∈ C just in case p ∈ L(C) and for any other C ∈ C with p ∈ L(C ), the cost of encoding p in C is less than or equal to the cost of encoding p in C (enc(p, C) ≤ enc(p, C )). We present a simple example in which a learner is faced with the task of categorizing the behaviour of two robots, given by means of sensors in the robots’ environment. In the learning stage each set of sensor data is labeled as an instance of a concept (e.g. ’ran- dom walking’, ’wall following’, ’chasing’, . . . ). Our robot learner then extracts the salient structure from each training set, possibly generalizing its theory of the training sets to include as yet unseen sensor inputs. Introduction Natural and artificial agents are confronted with the problem of constructing models of their environ- ments, thereby imposing order on an otherwise chaotic stream of sensory data. An important step in this di- rection is to recognize when a novel sense datum is sufficiently similar to a previously seen one to make a similar response adaptive. This is the problem of categorization - how does one decide which proper- ties of sense data are relevant, and which are acciden- tal. Obviously, treating all properties as relevant leads to no generalization at all, and treating all proper- ties as accidental leads to maladaptive behaviour (un- less the environment is perverse). This problem has been explored in various settings using Minimal De- scription Length (MDL) concept learning, where the notions of concise encoding of the hypothesis and of efficient encoding of the observed data impose compet- ing pressures (generalizing by minimizing theory size while still being informative about the data), where the measures of theory size and empirical informa- tiveness can be adjusted to fit the domain. We have previously applied this framework both to the clas- of Linguistics, UCLA. Los Angeles, CA 90095 Systems Department, The Pennsylvania State University, State College, PA 16804 ‡ Informatics Department, The Pennsylvania State Univer- sity, State College, PA 16804 § Department of Organismic Biology, Evolution and Ecology, UCLA. Los Angeles, CA 90095 † Distributed ∗ Department Minimum Description Length The basic idea of the MDL framework is that the best hypothesis to adopt when confronted with a set of data is the one that best navigates between the charyb-