Visual learning and its application to sensorimotor actions (computer vision)

The capability of recognition is a key indicator of the capability of an autonomous agent. However, high dimensionality and variation of sensory input make learning for recognition a very challenging task. The goal of the work presented here is to enable a system to learn directly from unsegmented and unedited sensory streams while interacting with the environment including human teachers. Automatically generate states for the tasks that the programmer does not know or even understand. Thus, the internal representation must be automatically generated. Although the framework is applicable to various types of learning mode, this work concentrates on the mode where desired actions are imposed in real time during training. A major technical challenge to realize the above objective is to automatically establish the mapping between a high dimensional input space and an output space. This mapping is accomplished by a doubly clustered subspace-based hierarchical discriminating regression (HDR) tree proposed in this work to efficiently deal with both classification and regression problems in high dimensional space. The major characteristics of this algorithm include: (1) Clustering is performed in both output space and input space at each internal node and thus the term “doubly clustered.” (2) Discriminants in the input space are automatically derived from the clusters in the input space. (3) A hierarchical probability distribution model is applied to the resulting discriminating subspace at each internal node. This realizes a coarse-to-fine approximation of probability distribution of the input samples. (4) An ample-size dependent negative-log-likelihood (NLL) is introduced to relax the per-class sample requirement. (5) The execution of HDR tree is fast, due to the logarithmic time complexity of the HDR tree. To learn interactively in real time in the environment, an incremental version of HDR tree (IHDR tree) was designed to meet this requirement. The IHDR tree rejects or accepts a learning sample according to the real time response. A forgetting process is applied to the IHDR tree algorithm to constrain the growth of the memory while not having a sudden decrease in the execution performance. In order to have a longer context for the robot, the current state of the robot and previous action feedback were also a part of the input to the robot. The HDR tree algorithms were tested for different types of data: synthetic data for examining the near-optimal performance, large raw face-image data bases along with a comparison with some major existing methods, such as CART, C5.0 and OC1. In addition to these data, the IHDR tree was applied to the vision-based navigation problem using simulated data. The proposed algorithm was also applied on the real-time tracking and reaching tasks for the robot application.