Vision based robotic gesture recognition

Vision in complex environments is a big scientific challenge for two reasons. First, in complex environments it is not possible to segment a scene into the constituent objects on the basis of simple cues. Second, unpredictable changes in the environment must be tolerated. A proper domain for investigating these problems is robotic gesture recognition, since the two problems arise there naturally. Furthermore, gesture recognition holds the promise of making man-machine interaction more natural and intuitive. The principal idea for tackling the first problem is the integration of information stemming from different cues. In the first part of this thesis, methods for tracking human hands, finding fingertips and recognizing hand postures despite complex backgrounds are presented, which owe their robustness to the integration of different complementary cues. The components have been integrated into a user-independent gesture interface implemented on an anthropomorphic robot. The second part is concerned with the adaptive integration of different cues, aimed at addressing the second problem. A model of adaptive sensory integration in the brain is proposed, which relates the psychophysical phenomena of suppression and recalibration of discordant sensory information to a self-organized adaptation employing fast synaptic plasticity mechanisms. Finally, the idea of self-organized adaptation is applied to the tracking of human faces in a scene. To this end, an adaptive tracking scheme is proposed which combines different cues in a “democratic” manner.

[1]  Thad Starner,et al.  Visual Recognition of American Sign Language Using Hidden Markov Models. , 1995 .

[2]  Jochen Triesch,et al.  GripSee: A Robot for Visually-Guided Grasping , 1998 .

[3]  Jochen Triesch,et al.  Robotic Gesture Recognition , 1997, Gesture Workshop.

[4]  Michael J. Swain,et al.  Happy patrons make better tippers: creating a robot waiter using Perseus and the Animate Agent architecture , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[5]  W. Singer Synchronization of cortical activity and its putative role in information processing and learning. , 1993, Annual review of physiology.

[6]  E. DeYoe,et al.  Concurrent processing in the primate visual cortex. , 1995 .

[7]  Joachim M. Buhmann,et al.  Distortion Invariant Object Recognition in the Dynamic Link Architecture , 1993, IEEE Trans. Computers.

[8]  Jochen Triesch,et al.  Towards Imitation Learning of Grasping Movements by an Autonomous Robot , 1999, Gesture Workshop.

[9]  Bartlett W. Mel SEEMORE: Combining Color, Shape, and Texture Histogramming in a Neurally Inspired Approach to Visual Object Recognition , 1997, Neural Computation.

[10]  Alan L. YuilleDivision A Bayesian Framework for the Integration of Visual Modules , 1996 .

[11]  William T. Freeman,et al.  Orientation Histograms for Hand Gesture Recognition , 1995 .

[12]  Ipke Wachsmuth,et al.  Gesture and Sign Language in Human-Computer Interaction , 1997, Lecture Notes in Computer Science.

[13]  Shaogang Gong,et al.  Modelling facial colour and identity with Gaussian mixtures , 1998, Pattern Recognit..

[14]  E. Kefalea Object localization and recognition for a grasping robot , 1998, IECON '98. Proceedings of the 24th Annual Conference of the IEEE Industrial Electronics Society (Cat. No.98CH36200).

[15]  H. Komatsu Mechanisms of central color vision , 1998, Current Opinion in Neurobiology.

[16]  David I. Perrett,et al.  Modeling visual recognition from neurobiological constraints , 1994, Neural Networks.

[17]  Alex Pentland,et al.  Dynamic models of human motion , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[18]  Jochen Triesch,et al.  Binding - A Proposed Experiment and a Model , 1996, ICANN.

[19]  Rolf P. Würtz,et al.  Object Recognition Robust Under Translations, Deformations, and Changes in Background , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  C. Malsburg The Coherence Definition of Consciousness , 1997 .

[21]  Markus Kohler Technical Details and Ergonomical Aspects of Gesture Recognition Applied in Intelligent Home Environments , 1997 .

[22]  Bartlett W. Mel Why Have Dendrites? A Computational Perspective , 1999 .

[23]  Jochen Triesch,et al.  GripSee: A Gesture-Controlled Robot for Object Perception and Manipulation , 1999, Auton. Robots.

[24]  Norbert Krüger,et al.  Visual learning with a priori constraints , 1998 .

[25]  Rolf P. Würtz,et al.  Multilayer dynamic link networks for establishing image point correspondences and visual object recognition , 1995 .

[26]  Jochen Triesch,et al.  Object Recognition with Multiple Feature Types , 1998 .

[27]  E. Newman,et al.  The infrared 'vision' of snakes , 1982 .

[28]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[29]  W. Singer,et al.  Oscillatory responses in cat visual cortex exhibit inter-columnar synchronization which reflects global stimulus properties , 1989, Nature.

[30]  Jochen Triesch,et al.  Robotic Gesture Recognition by Cue Combination , 1998, GI Jahrestagung.

[31]  B. Pakkenberg,et al.  Neocortical neuron number in humans: Effect of sex and age , 1997, The Journal of comparative neurology.

[32]  Peter Bakker,et al.  Robot see, robot do: An overview of robot imitation , 1996 .

[33]  Horst-Michael Groß,et al.  User localisation for visually-based human-machine-interaction , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[34]  D J Field,et al.  Relations between the statistics of natural images and the response properties of cortical cells. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[35]  Yuntao Cui,et al.  Hand sign recognition from intensity image sequences with complex backgrounds , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[36]  Jochen Triesch,et al.  A gesture interface for human-robot-interaction , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[37]  Leslie G. Ungerleider,et al.  ‘What’ and ‘where’ in the human brain , 1994, Current Opinion in Neurobiology.

[38]  C. Gilbert Plasticity in visual perception and physiology , 1996, Current Opinion in Neurobiology.

[39]  Jan C. Vorbrüggen Zwei Modelle zur datengetriebenen Segmentierung visueller Daten , 1995 .

[40]  Christoph von der Malsburg,et al.  Self Calibration of the Fixation Movement of a Stereo Camera Head , 2004, Machine Learning.

[41]  S. Zeki The visual image in mind and brain. , 1992, Scientific American.

[42]  M Rucci,et al.  Registration of Neural Maps through Value-Dependent Learning: Modeling the Alignment of Auditory and Visual Maps in the Barn Owl’s Optic Tectum , 1997, The Journal of Neuroscience.

[43]  Gerhard Rigoll,et al.  High Performance Real-Time Gesture Recognition Using Hidden Markov Models , 1997, Gesture Workshop.

[44]  Martin Lades,et al.  Invariant object recognition with dynamical links, robust to variations in illumination , 1994 .

[45]  P. Milner A model for visual shape recognition. , 1974, Psychological review.

[46]  Mahmood R. Azimi-Sadjadi,et al.  A study of cloud classification with neural networks using spectral and textural features , 1999, IEEE Trans. Neural Networks.

[47]  T. Bergener,et al.  A framework for dynamic man-machine interaction implemented on an autonomous mobile robot , 1997, ISIE '97 Proceeding of the IEEE International Symposium on Industrial Electronics.

[48]  Steven Kay,et al.  Fundamentals Of Statistical Signal Processing , 2001 .

[49]  Alex Pentland,et al.  Invariant features for 3-D gesture recognition , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[50]  H. Markram,et al.  Regulation of Synaptic Efficacy by Coincidence of Postsynaptic APs and EPSPs , 1997, Science.

[51]  Norbert Krüger,et al.  Face Recognition and Gender determination , 1995 .

[52]  E. Bullmore,et al.  Activation of auditory cortex during silent lipreading. , 1997, Science.

[53]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[54]  C. Malsburg Binding in models of perception and brain function , 1995, Current Opinion in Neurobiology.

[55]  David C. Hogg,et al.  Towards 3D hand tracking using a deformable model , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[56]  D. Massaro,et al.  Models of integration given multiple sources of information. , 1990, Psychological review.

[57]  Mohinder S. Grewal,et al.  Kalman Filtering: Theory and Practice , 1993 .

[58]  Robin R. Murphy,et al.  Biological and cognitive foundations of intelligent sensor fusion , 1996, IEEE Trans. Syst. Man Cybern. Part A.

[59]  Laurenz Wiskott,et al.  Labeled graphs and dynamic link matching for face recognition and scene analysis , 1995 .

[60]  Li I. Zhang,et al.  A critical window for cooperation and competition among developing retinotectal synapses , 1998, Nature.

[61]  A. Steinhage,et al.  Attractor dynamics to fuse strongly perturbed sensor data , 1999, IEEE 1999 International Geoscience and Remote Sensing Symposium. IGARSS'99 (Cat. No.99CH36293).

[62]  Myron W. Krueger,et al.  Artificial reality II , 1991 .

[63]  E. Capaldi,et al.  The organization of behavior. , 1992, Journal of applied behavior analysis.

[64]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[65]  Ren C. Luo,et al.  Multisensor integration and fusion in intelligent systems , 1989, IEEE Trans. Syst. Man Cybern..

[66]  Werner von Seelen,et al.  Complex behavior by means of dynamical systems for an anthropomorphic robot , 1999, Neural Networks.

[67]  M. Sur,et al.  Activity-dependent remodeling of connections in the mammalian visual system , 1995, Current Opinion in Neurobiology.

[68]  Sven Schröter,et al.  Handgestenerkennung durch Computersehen , 1998, GI Jahrestagung.

[69]  Todd S. Horowitz,et al.  Visual search has no memory , 1998, Nature.

[70]  Christoph von der Malsburg,et al.  The Correlation Theory of Brain Function , 1994 .

[71]  Henry Rice,et al.  Fundamentals of Human Neuropsychology , 1985, The Yale Journal of Biology and Medicine.

[72]  Jochen Triesch,et al.  Robust classification of hand postures against complex backgrounds , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[73]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[74]  John R. Kender,et al.  Toward the use of gesture in traditional user interfaces , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[75]  Brains of Rats and Men , 1965 .

[76]  D C Van Essen,et al.  Information processing in the primate visual system: an integrated systems perspective. , 1992, Science.

[77]  Christof Koch,et al.  Computation and the single neuron , 1997, Nature.

[78]  A. Treisman,et al.  A feature-integration theory of attention , 1980, Cognitive Psychology.

[79]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[80]  J. P. Jones,et al.  An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex. , 1987, Journal of neurophysiology.

[81]  Markus Kohler,et al.  Videobasierte Mensch-Maschine Interaktion , 1996, Informationstechnik Tech. Inform..

[82]  J. Daugman Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. , 1985, Journal of the Optical Society of America. A, Optics and image science.

[83]  Martin L. A. Sternberg American Sign Language Dictionary , 1981 .

[84]  S. Kay Fundamentals of statistical signal processing: estimation theory , 1993 .

[85]  Bruno A. Olshausen,et al.  Dynamic routing strategies in sensory, motor, and cognitive processing , 1994 .

[86]  K Tanaka,et al.  Neuronal mechanisms of object recognition. , 1993, Science.

[87]  Klaus Schulten,et al.  Topology-conserving maps for learning visuo-motor-coordination , 1989, Neural Networks.

[88]  U. Bellugi,et al.  Perception of American sign language in dynamic point-light displays. , 1981, Journal of experimental psychology. Human perception and performance.

[89]  Helge J. Ritter,et al.  Detection of Fingertips in Human Hand Movement Sequences , 1997, Gesture Workshop.

[90]  A. Yuille,et al.  Bayesian decision theory and psychophysics , 1996 .

[91]  John G. Daugman,et al.  Complete discrete 2-D Gabor transforms by neural networks for image analysis and compression , 1988, IEEE Trans. Acoust. Speech Signal Process..

[92]  P. Marler,et al.  Communication Goes Multimodal , 1999, Science.

[93]  W Singer,et al.  The Perceptual Grouping Criterion of Colinearity is Reflected by Anisotropies of Connections in the Primary Visual Cortex , 1997, The European journal of neuroscience.

[94]  T Poggio,et al.  Parallel integration of vision modules. , 1988, Science.

[95]  David J. Field,et al.  Contour integration by the human visual system: Evidence for a local “association field” , 1993, Vision Research.

[96]  C. L. M. The Psychology of Attention , 1890, Nature.

[97]  Hartmut Neven,et al.  PersonSpotter-fast and robust system for human detection, tracking and recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.