Learning language about objects and using this language to learn further: the childlike system

Artificial intelligence and cognitive science have traditionally adopted the modular approach to intelligence: vision, language and action have been analyzed separately and systems have been developed that demonstrate competence in each of these areas. It is only recently that integrated systems that span multiple modalities and performance abilities have begun to gain attention. This thesis describes an integrated, computational model of learning--implemented as a program, called the CHILDLIKE system. CHILDLIKE starts by combining language and vision, and attempts to include actions and needs in the same framework. The system learns names for simple objects and their qualities from input experiences that consist of small visual feature arrays and short language strings. Using this as a foundation, the system is able to learn spatial relations and also acquire knowledge related to the effect of actions on external perceptual states and internal need levels. CHILDLIKE employs an interacting set of hybrid symbolic-connectionist learning mechanisms: extraction, aggregation, generation, re-weighting, de-generation and generalization. Since the current version of the system is a first step towards building a realistic integrated system that combines vision, language and action, arguments as to the scalability and generality of the approach are also presented.