Semantic feature extraction with multidimensional hidden Markov model

Conventional block-based classification is based on the labeling of individual blocks of an image, disregarding any adjacency information. When analyzing a small region of an image, it is sometimes difficult even for a person to tell what the image is about. Hence, the drawback of context-free use of visual features is recognized up front. This paper studies a context-dependant classifier based on a two dimensional Hidden Markov Model. In particular we explore how the balance between structural information and content description affect the precision in a semantic feature extraction scenario. We train a set of semantic classes using the development video archive annotated by the TRECVid 2005 participants. To extract semantic features the classes with maximum a posteriori probability are searched jointly for all blocks. Preliminary results indicate that the performance of the system can be increased by varying the block size.

[1]  Rosalind W. Picard,et al.  Interactive Learning Using a "Society of Models" , 2017, CVPR 1996.

[2]  James Ze Wang Integrated Region-Based Image Retrieval , 2001, The Information Retrieval Series.

[3]  Robert M. Gray,et al.  Image classification by a two dimensional hidden Markov model , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4]  Roberto Pieraccini,et al.  Dynamic planar warping for optical character recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  L. R. Rabiner,et al.  On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[6]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[7]  Lei Zhang,et al.  A CBIR method based on color-spatial feature , 1999, Proceedings of IEEE. IEEE Region 10 Conference. TENCON 99. 'Multimedia Technology for Asia-Pacific Information Infrastructure' (Cat. No.99CH37030).

[8]  Frederick Jelinek,et al.  Basic Methods of Probabilistic Context Free Grammars , 1992 .

[9]  Chengjun Liu,et al.  Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition , 2002, IEEE Trans. Image Process..

[10]  Roberto Pieraccini,et al.  Connected and degraded text recognition using planar hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Laurence Likforman-Sulem,et al.  A comparative study between decision fusion and data fusion in Markovian printed character recognition , 2002, Object recognition supported by user interaction for service robots.

[12]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[13]  Stéphane Marchand-Maillet,et al.  Approximate Viterbi decoding for 2D-hidden Markov models , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[14]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[15]  J. Baker Trainable grammars for speech recognition , 1979 .

[16]  Kenneth Rose,et al.  Deformable face mapping for person identification , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[17]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.