On Parsing Visual Sequences with the Hidden Markov Model

Hidden Markov Models have been employed in many vision applications to model and identify events of interest. Their use is common in applications where HMMs are used to classify previously divided segments of video as one of a set of events being modelled. HMMs can also simultaneously segment and classify events within a continuous video, without the need for a separate first step to identify the start and end of the events. This is significantly less common. This paper is an exploration of the development of HMM frameworks for such complete event recognition. A review of how HMMs have been applied to both event classification and recognition is presented. The discussion evolves in parallel with an example of a real application in psychology for illustration. The complete videos depict sessions where candidates perform a number of different exercises under the instruction of a psychologist. The goal is to isolate portions of video containing just one of these exercises. The exercise involves rotating the head of a kneeling subject to the left, back to centre, to the right, to the centre, and repeating a number of times. By designing a HMM system to automatically isolate portions of video containing this exercise, issues such as the strategy of choice of event to be modelled, feature design and selection, as well as training and testing are reviewed. Thus this paper shows how HMMs can be more extensively applied in the domain of event recognition in video.

[1]  Alex Pentland,et al.  Real-Time American Sign Language Recognition Using Desk and Wearable Computer Based Video , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  John S. Boreczky,et al.  A hidden Markov model framework for video segmentation using audio and image features , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[3]  Erika Doyle Evaluation of movement programmes in the treatment of dyslexia , 2008 .

[4]  Luc Vincent,et al.  Watersheds in Digital Spaces: An Efficient Algorithm Based on Immersion Simulations , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Aaron F. Bobick,et al.  Recognition of Visual Activities and Interactions by Stochastic Parsing , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Willem Jonker,et al.  Recognizing Strokes in Tennis Videos using Hidden Markov Models , 2001, VIIP.

[7]  Honggang Wang,et al.  American Sign Language Recognition Using Multi-dimensional Hidden Markov Models , 2006, J. Inf. Sci. Eng..

[8]  Rama Chellappa,et al.  Activity Modeling Using Event Probability Sequences , 2008, IEEE Transactions on Image Processing.

[9]  Masamichi Shimosaka,et al.  Online recognition and segmentation for time-series motion with HMM and conceptual relation of actions , 2005, 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[10]  Steve Young,et al.  Token passing: a simple conceptual model for connected speech recognition systems , 1989 .

[11]  Patrick Gros,et al.  Hierarchical structure analysis of sport videos using HMMS , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[12]  Alan F. Smeaton,et al.  Indexing of Fictional Video Content for Event Detection and Summarisation , 2007, EURASIP J. Image Video Process..

[13]  E. Petajan,et al.  An improved automatic lipreading system to enhance speech recognition , 1988, CHI '88.

[14]  Svetha Venkatesh,et al.  Robust Recognition and Segmentation of Human Actions Using HMMs with Missing Observations , 2005, EURASIP J. Adv. Signal Process..

[15]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[16]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[17]  Junji Yamato,et al.  Recognizing human action in time-sequential images using hidden Markov model , 1992, Proceedings 1992 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Zhu Liu,et al.  Multimedia content analysis-using both audio and visual clues , 2000, IEEE Signal Process. Mag..

[19]  Eun-Jung Holden,et al.  Dynamic Fingerspelling Recognition using Geometric and Motion Features , 2006, 2006 International Conference on Image Processing.

[20]  Alberto Del Bimbo,et al.  Soccer highlights detection and recognition using HMMs , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[21]  Anil C. Kokaram,et al.  Motion picture restoration - digital algorithms for artefact suppression in degraded motion picture film and video , 2001 .

[22]  Björn W. Schuller,et al.  Segmentation and Recognition of Meeting Events using a Two-Layered HMM and a Combined MLP-HMM Approach , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[23]  Laurent Joyeux,et al.  Motion based parsing for video from observational psychology , 2006, Electronic Imaging.

[24]  Anil C. Kokaram,et al.  Content based access for a massive database of human observation video , 2004, MIR '04.

[25]  J. Kittler,et al.  Automatic evolution tracking for tennis matches using an HMM-based architecture , 2004, Proceedings of the 2004 14th IEEE Signal Processing Society Workshop Machine Learning for Signal Processing, 2004..

[26]  Peter Morguet,et al.  An integral stochastic approach to image sequence segmentation and classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[27]  Wong Ka Yan,et al.  Fast Rotation Center Identification Methods for Video Sequences , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[28]  Nianjun Liu,et al.  Understanding HMM training for video gesture recognition , 2004, 2004 IEEE Region 10 Conference TENCON 2004..

[29]  Jenq-Neng Hwang,et al.  Lipreading from color video , 1997, IEEE Trans. Image Process..

[30]  J Devlin Child Development, Diagnosis and Assessment. , 1991 .

[31]  PG Hepper,et al.  Effects of replicating primary-reflex movements on specific reading difficulties in children: a randomised, double-blind, controlled trial , 2000, The Lancet.

[32]  Anil C. Kokaram,et al.  Modeling high level structure in sports with motion driven HMMs , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[33]  Matthew Brand,et al.  Discovery and Segmentation of Activities in Video , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Dan Schonfeld,et al.  Object Trajectory-Based Activity Classification and Recognition Using Hidden Markov Models , 2007, IEEE Transactions on Image Processing.

[35]  Steve Young,et al.  The HTK book , 1995 .

[36]  Ian D. Reid,et al.  A general method for human activity recognition in video , 2006, Comput. Vis. Image Underst..

[37]  Mei Han,et al.  Extract highlights from baseball game video with hidden Markov models , 2002, Proceedings. International Conference on Image Processing.