Layered representations for human activity recognition

We present the use of layered probabilistic representations using hidden Markov models for performing sensing, learning, and inference at multiple levels of temporal granularity We describe the use of representation in a system that diagnoses states of a user's activity based on real-time streams of evidence from video, acoustic, and computer interactions. We review the representation, present an implementation, and report on experiments with the layered representation in an office-awareness application.

[1]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[2]  Alex Pentland,et al.  Real-time American Sign Language recognition from video using hidden Markov models , 1995 .

[3]  Stuart J. Russell,et al.  The BATmobile: Towards a Bayesian Automated Taxi , 1995, IJCAI.

[4]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Michael S. Brandstein,et al.  A practical methodology for speech source localization with microphone arrays , 1997, Comput. Speech Lang..

[6]  H. Buxton,et al.  Advanced visual surveillance using Bayesian networks , 1997 .

[7]  Aaron F. Bobick,et al.  Recognition and interpretation of parametric gesture , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[8]  Anthony G. Cohn,et al.  Building qualitative event models automatically from visual input , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[9]  Aaron F. Bobick,et al.  A Framework for Recognizing Multi-Agent Action from Visual Evidence , 1999, AAAI/IAAI.

[10]  Eric Horvitz,et al.  Attention-Sensitive Alerting , 1999, UAI.

[11]  Alex Pentland,et al.  Unsupervised clustering of ambulatory audio and video , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[12]  J. Aggarwal,et al.  A Bayesian approach to human activity recognition , 1999, Proceedings Second IEEE Workshop on Visual Surveillance (VS'99) (Cat. No.98-89223).

[13]  Saul Greenberg,et al.  Judging people's availability for interaction from video snapshots , 1999, Proceedings of the 32nd Annual Hawaii International Conference on Systems Sciences. 1999. HICSS-32. Abstracts and CD-ROM of Full Papers.

[14]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Ramakant Nevatia,et al.  Representation and optimal recognition of human activities , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[16]  Matthew Brand,et al.  Discovery and Segmentation of Activities in Video , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  I. M. Anonymous Hierarchical unsupervised learning of event categories , 2001 .

[18]  David C. Hogg,et al.  Learning Variable-Length Markov Models of Behavior , 2001, Comput. Vis. Image Underst..

[19]  Jeffrey M. Zacks,et al.  Event structure in perception and conception. , 2001, Psychological bulletin.

[20]  Shuicheng Yan,et al.  Real-Time MultiView Face Detection , Tracking , Pose Estimation , Alignment , and Recognition ( Updated Dec 1 , 2001 ) , 2001 .

[21]  Stan Z. Li,et al.  Real-time multi-view face detection , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[22]  Eric Horvitz,et al.  Models of attention in computing and communication , 2003, Commun. ACM.