Visual Gesture Recognition by a Modular Neural System

The visual recognition of human hand pointing gestures from stereo pairs of video camera images provides a very intuitive kind of man-machine interface. We show that a modular, neural network based system can solve this task in a realistic laboratory environment. Several neural networks account for image segmentation, estimation of hand location, estimation of 3D-pointing direction, and necessary transforms from image to world coordinates and vice versa. The functions of all network modules can be learned from data examples only, by exploiting various learning algorithms. We investigate the performance of such a system and dicuss the problem of operator-independent recognition.