Seeing, understanding and doing human task

Functional units and working algorithms for real-time visual recognition of human pick-and-place action sequences are presented. The action recognizer consists of visual feature detectors, an action/environment model, and an attention stack. It generates a symbolic description of the observed action sequence. Given a different initial state, the system reinstantiates the recognized action sequence to carry out an equivalent assembly task. Experimental results on several assembly tasks support the effectiveness of the method.<<ETX>>