Autonomous learning for a cognitive agent using continuous models and inductive logic programming from audio-visual input

A framework for autonomous (human-like) learning of object, event and protocol models from audio-visual data, for use by an artificial “cognitive agent”, is presented. This is motivated by the aim of creating a synthetic agent that can observe a scene containing unknown objects and agents, operating under unknown spatio-temporal motion protocols, and learn models of these objects and protocols sufficient to act in accordance with the implicit protocols presented to it. The framework supports low-level (continuous) statistical learning methods, for object learning, and higher-level (symbolic) learning for sequences of events representing implicit temporal protocols (analogous to grammar learning). Symbolic learning is performed using the “Progol” Inductive Logic Programming (ILP) system to generalise a symbolic data set, formed using the lower level (continuous) methods. The subsumption learning approach employed by the ILP system allows for generalisations of concepts such as equality, transitivity and symmetry, not easily generalised using standard statistical techniques, and for the automatic selection of relevant configural and temporal information. The system is potentially applicable to a wide range of domains, and is demonstrated in multiple simple game playing scenarios, in which the agent first observes a human playing a game (including vocal facial expression), and then attempts game playing based on the low level (continuous) and high level (symbolic) generalisations it has formulated.

[1]  M J Sternberg,et al.  Application of machine learning to structural molecular biology. , 1994, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[2]  Shaun P. Hargreaves Heap,et al.  Game Theory: A Critical Introduction , 1995 .

[3]  Nicolai Petkov,et al.  Image classification system based on cortical representations and unsupervised neural network learning , 1995, Proceedings of Conference on Computer Architectures for Machine Perception.

[4]  Stephen Muggleton,et al.  Combining active learning with inductive logicprogramming to close the loop in machine learning , 1999 .

[5]  Stevan Harnad The Symbol Grounding Problem , 1999, ArXiv.

[6]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Jeffrey Mark Siskind,et al.  Visual Event Classification via Force Dynamics , 2000, AAAI/IAAI.

[8]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[9]  Irfan A. Essa,et al.  Recognizing multitasked activities from video using stochastic context-free grammar , 2002, AAAI/IAAI.

[10]  Robert Givan,et al.  Specific-to-general learning for temporal events , 2002, AAAI/IAAI.

[11]  Vittorio Gallese,et al.  Mirror Neurons and the Evolution of Brain and Language , 2002 .

[12]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[13]  Selim Aksoy,et al.  SCENE MODELING AND IMAGE MINING WITH A VISUAL GRAMMAR , 2002 .

[14]  Giulio Sandini,et al.  Learning about objects through action - initial steps towards artificial cognition , 2003, 2003 IEEE International Conference on Robotics and Automation (Cat. No.03CH37422).

[15]  Derek R. Magee,et al.  Looking for Logic in Vision , 2004 .

[16]  Derek R. Magee,et al.  Tracking multiple vehicles using foreground, background and motion models , 2004, Image Vis. Comput..