Vision and Learning for Intelligent Human-Computer Interaction

It was a dream to make computers see. The research in computer vision provides promising technologies to capture, analyze, transmit, retrieve and interpret visual information. However, due to the richness and large variations in the visual inputs, the practice of many statistical learning techniques for visual motion capturing and recognition are confronted by some similar problems, such that making intelligent and visually capable machines is still a challenging task. This dissertation concentrates on two important problems: capturing and recognizing human motion in video sequences, which are crucial for the research and applications of intelligent human computer interaction, multimedia communication, and smart environments. This dissertation presents three effective techniques for visual motion analysis tasks: non-stationary color model adaptation for efficient localization, multiple visual cues integration for robust tracking, and learning motion models for capturing articulated hand motion. Besides, this dissertation describes a novel statistical learning method, the Discriminant-EM (D-EM) algorithm, in the framework of self-supervised learning paradigm. D-EM employs both labeled and unlabeled training data and converges supervised and unsupervised learning. Many topics in the dissertation is unified by the four problems of self-supervised learning, i.e., transduction, co-transduction, model transduction and co-inferencing. Extensive experiments and two prototype systems have validated the proposed approaches in the domain of vision-based human computer interaction.

[1]  Ying Wu,et al.  Capturing natural hand articulation , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[2]  Ying Wu,et al.  Interactive and Incremental Learning via a Mixture of Supervised and Unsupervised Learning Strategies , 2000 .

[3]  Qi Tian,et al.  Incorporate discriminant analysis with EM algorithm in image retrieval , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[4]  Qi Tian,et al.  Integrating unlabeled images for image retrieval based on relevance feedback , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[5]  Andrew Blake,et al.  A Probabilistic Exclusion Principle for Tracking Multiple Objects , 2000, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[6]  Ying Wu,et al.  Self-Supervised Learning for Visual Tracking and Recognition of Human Hand , 2000, AAAI/IAAI.

[7]  Simon J. Godsill,et al.  On sequential Monte Carlo sampling methods for Bayesian filtering , 2000, Stat. Comput..

[8]  Ying Wu,et al.  Bootstrap Initialization of Nonparametric Texture Models for Tracking , 2000, ECCV.

[9]  Rómer Rosales,et al.  Inferring body pose without tracking body parts , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[10]  Francis K. H. Quek,et al.  Gesture, speech, and gaze cues for discourse segmentation , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[11]  Ying Wu,et al.  View-independent recognition of hand postures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[12]  Pietro Perona,et al.  Towards automatic discovery of object categories , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[13]  Andrew Blake,et al.  Articulated body motion capture by annealed particle filtering , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[14]  Ying Wu,et al.  Color tracking by transductive learning , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[15]  Rama Chellappa,et al.  Simultaneous tracking and verification via sequential posterior estimation , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[16]  Hai Tao,et al.  Dynamic layer representation with applications to tracking , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[17]  Ying Wu,et al.  Wide-range, person- and illumination-insensitive head orientation estimation , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[18]  Dorin Comaniciu,et al.  Real-time tracking of non-rigid objects using mean shift , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[19]  Andrew Blake,et al.  Classification of human body motion , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[20]  Gunnar Rätsch,et al.  Invariant Feature Extraction and Classification in Kernel Spaces , 1999, NIPS.

[21]  Volker Roth,et al.  Nonlinear Discriminant Analysis Using Kernel Functions , 1999, NIPS.

[22]  Ying Wu,et al.  Human hand modeling, analysis and animation in the context of HCI , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[23]  Armin B. Cremers,et al.  Pattern Recognition Combining De-noising and Linear Discriminant Analysis within a Real World Application , 1999, CAIP.

[24]  Simone Santini,et al.  Similarity Measures , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  Kentaro Toyama,et al.  Wallflower: principles and practice of background maintenance , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[26]  B. Scholkopf,et al.  Fisher discriminant analysis with kernels , 1999, Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

[27]  Ellen Riloff,et al.  Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping , 1999, AAAI/IAAI.

[28]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[29]  James M. Rehg,et al.  A multiple hypothesis approach to figure tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[30]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[31]  Juyang Weng,et al.  Hierarchical Discriminant Analysis for Image Retrieval , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Dimitris N. Metaxas,et al.  Toward Scalability in ASL Recognition: Breaking Down Signs into Phonemes , 1999, Gesture Workshop.

[33]  Ying Wu,et al.  Vision-Based Gesture Recognition: A Review , 1999, Gesture Workshop.

[34]  Kristin P. Bennett,et al.  Combining support vector and mathematical programming methods for classification , 1999 .

[35]  Ayhan Demiriz,et al.  Semi-Supervised Support Vector Machines , 1998, NIPS.

[36]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[37]  C. Nolker,et al.  Illumination independent recognition of deictic arm postures , 1998, IECON '98. Proceedings of the 24th Annual Conference of the IEEE Industrial Electronics Society (Cat. No.98CH36200).

[38]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[39]  Dana H. Ballard,et al.  Category Learning Through Multimodality Sensing , 1998, Neural Computation.

[40]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[41]  Rajeev Sharma,et al.  Reliable tracking of human arm dynamics by multiple cue integration and constraint fusion , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[42]  Claudio S. Pinhanez,et al.  Human action detection using PNF propagation of temporal constraints , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[43]  Ronen Basri,et al.  Clustering appearances of 3D objects , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[44]  Aaron F. Bobick,et al.  Action recognition using probabilistic parsing , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[45]  Gregory D. Hager,et al.  Joint probabilistic techniques for tracking multi-part objects , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[46]  Narendra Ahuja,et al.  Extraction and classification of visual motion patterns for hand gesture recognition , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[47]  Michael Isard,et al.  ICONDENSATION: Unifying Low-Level and High-Level Tracking in a Stochastic Framework , 1998, ECCV.

[48]  Shaogang Gong,et al.  Colour Model Selection and Adaption in Dynamic Scenes , 1998, ECCV.

[49]  Ming Ouhyoung,et al.  A real-time continuous gesture recognition system for sign language , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[50]  Takahiro Watanabe,et al.  Real time gesture recognition using eigenspace from multi-input image sequences , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[51]  Matthew Turk,et al.  View-based interpretation of real-time optical flow for gesture recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[52]  Shan Lu,et al.  Color-based hands tracking system for sign language recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[53]  Kang-Hyun Jo,et al.  Manipulative hand gesture recognition using task knowledge for human computer interaction , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[54]  Yoshiaki Shirai,et al.  Hand gesture estimation and model refinement using monocular camera-ambiguity limitation by inequality constraints , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[55]  Aaron F. Bobick,et al.  Recognition and interpretation of parametric gesture , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[56]  Dimitris N. Metaxas,et al.  ASL recognition based on a coupling between HMMs and 3D motion analysis , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[57]  Alex Pentland,et al.  A wearable computer-based American sign Language Recogniser , 1997, Digest of Papers. First International Symposium on Wearable Computers.

[58]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[60]  S. Sarkar,et al.  Human skin and hand motion analysis from range image sequences using nonlinear FEM , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[61]  J. Aggarwal,et al.  Human motion analysis: a review , 1997 .

[62]  T. Kobayashi,et al.  Partly-hidden Markov model and its application to gesture recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[63]  P. Yan,et al.  Nonparameter density estimation using wavelet transformation and scale-space zero-crossing reconstruction , 1996, Proceedings of Third International Conference on Signal Processing (ICSP'96).

[64]  Francis K. H. Quek,et al.  Inductive learning in hand pose recognition , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[65]  Yuntao Cui,et al.  Hand sign recognition from intensity image sequences with complex backgrounds , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[66]  Alex Pentland,et al.  Invariant features for 3-D gesture recognition , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[67]  Jochen Triesch,et al.  Robust classification of hand postures against complex backgrounds , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[68]  David C. Hogg,et al.  Towards 3D hand tracking using a deformable model , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[69]  John R. Kender,et al.  Finding skin in color images , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[70]  Alex Pentland,et al.  Active gesture recognition using partially observable Markov decision processes , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[71]  KwangYun Wohn,et al.  Recognition of space-time hand-gestures using hidden Markov model , 1996, VRST.

[72]  Yuntao Cui,et al.  Hand segmentation using learning-based prediction and verification for hand sign recognition , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[73]  Gregory D. Hager,et al.  Real-time tracking of image regions with changes in geometry and illumination , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[74]  Gregory D. Hager,et al.  Incremental focus of attention for robust visual tracking , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[75]  Michael Isard,et al.  Contour Tracking by Stochastic Propagation of Conditional Density , 1996, ECCV.

[76]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[77]  Alex Pentland,et al.  Pfinder: real-time tracking of the human body , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[78]  Michael I. Jordan,et al.  Exploiting Tractable Substructures in Intractable Networks , 1995, NIPS.

[79]  Vladimir Cherkassky,et al.  Statistical analysis of self-organization , 1995, Neural Networks.

[80]  Tosiyasu L. Kunii,et al.  Model-based analysis of hand posture , 1995, IEEE Computer Graphics and Applications.

[81]  J. Ohya,et al.  Applications of HMM modeling to recognizing human gestures in image sequences for a man-machine interface , 1995, Proceedings 4th IEEE International Workshop on Robot and Human Communication.

[82]  Thomas S. Huang,et al.  Vision based hand modeling and tracking for virtual teleconferencing and telecollaboration , 1995, Proceedings of IEEE International Conference on Computer Vision.

[83]  Yuntao Cui,et al.  Learning-based hand sign recognition using SHOSLIF-M , 1995, Proceedings of IEEE International Conference on Computer Vision.

[84]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[85]  Sang-Hui Park,et al.  Self-creating and organizing neural networks , 1994, IEEE Trans. Neural Networks.

[86]  Yangsheng Xu,et al.  Gesture interface: modeling and learning , 1994, Proceedings of the 1994 IEEE International Conference on Robotics and Automation.

[87]  Mubarak Shah,et al.  Visual gesture recognition , 1994 .

[88]  Michael I. Jordan,et al.  Supervised learning from incomplete data via an EM approach , 1993, NIPS.

[89]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[90]  W. Stokoe Sign Language Structure , 1980 .

[91]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[92]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[93]  Ying Wu,et al.  Hand modeling, analysis and recognition , 2001, IEEE Signal Process. Mag..

[94]  Stan Sclaroff,et al.  3D hand pose reconstruction using specialized mappings , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[95]  Ying Wu,et al.  Self-supervised learning for object recognition based on kernel discriminant-EM algorithm , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[96]  Ying Wu,et al.  An Adaptive Self-Organizing Color Segmentation Algorithm with Application to Robust Real-time Human Hand Localization , 2000 .

[97]  Qi Tian,et al.  Discriminant-EM algorithm with application to image retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[98]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[99]  Dariu Gavrila,et al.  The Visual Analysis of Human Movement: A Survey , 1999, Comput. Vis. Image Underst..

[100]  Vladimir Pavlovic,et al.  Dynamic bayesian networks for information fusion with applications to human-computer interfaces , 1999 .

[101]  Alex Pentland,et al.  Modeling and Prediction of Human Behavior , 1999, Neural Computation.

[102]  Jun S. Liu,et al.  Sequential Monte Carlo methods for dynamic systems , 1997 .

[103]  David Alan Becker,et al.  Sensei, a real-time recognition, feedback and training system for T'ai chi gestures , 1997 .

[104]  Volker Tresp,et al.  Efficient Methods for Dealing with Missing Data in Supervised Learning , 1994, NIPS.

[105]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.

[106]  Bernd Fritzke,et al.  A Growing Neural Gas Network Learns Topologies , 1994, NIPS.

[107]  Virginia R. de Sa,et al.  Learning Classification with Unlabeled Data , 1993, NIPS.

[108]  Tsu-Chang Lee,et al.  Structure level adaptation for artificial neural networks , 1991 .