论文信息 - Semantic feature extraction with multidimensional hidden Markov model

Semantic feature extraction with multidimensional hidden Markov model

Conventional block-based classification is based on the labeling of individual blocks of an image, disregarding any adjacency information. When analyzing a small region of an image, it is sometimes difficult even for a person to tell what the image is about. Hence, the drawback of context-free use of visual features is recognized up front. This paper studies a context-dependant classifier based on a two dimensional Hidden Markov Model. In particular we explore how the balance between structural information and content description affect the precision in a semantic feature extraction scenario. We train a set of semantic classes using the development video archive annotated by the TRECVid 2005 participants. To extract semantic features the classes with maximum a posteriori probability are searched jointly for all blocks. Preliminary results indicate that the performance of the system can be increased by varying the block size.

Benoit Huet | Joakim Jiten | Bernard Merialdo

[1] Rosalind W. Picard,et al. Interactive Learning Using a "Society of Models" , 2017, CVPR 1996.

[2] James Ze Wang. Integrated Region-Based Image Retrieval , 2001, The Information Retrieval Series.

[3] Robert M. Gray,et al. Image classification by a two dimensional hidden Markov model , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[4] Roberto Pieraccini,et al. Dynamic planar warping for optical character recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5] L. R. Rabiner,et al. On the application of vector quantization and hidden Markov models to speaker-independent, isolated word recognition , 1983, The Bell System Technical Journal.

[6] Frederick Jelinek,et al. Statistical methods for speech recognition , 1997 .

[7] Lei Zhang,et al. A CBIR method based on color-spatial feature , 1999, Proceedings of IEEE. IEEE Region 10 Conference. TENCON 99. 'Multimedia Technology for Asia-Pacific Information Infrastructure' (Cat. No.99CH37030).

[8] Frederick Jelinek,et al. Basic Methods of Probabilistic Context Free Grammars , 1992 .

[9] Chengjun Liu,et al. Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition , 2002, IEEE Trans. Image Process..

[10] Roberto Pieraccini,et al. Connected and degraded text recognition using planar hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11] Laurence Likforman-Sulem,et al. A comparative study between decision fusion and data fusion in Markovian printed character recognition , 2002, Object recognition supported by user interaction for service robots.

[12] David A. Forsyth,et al. Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[13] Stéphane Marchand-Maillet,et al. Approximate Viterbi decoding for 2D-hidden Markov models , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[14] L. Baum,et al. Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[15] J. Baker. Trainable grammars for speech recognition , 1979 .

[16] Kenneth Rose,et al. Deformable face mapping for person identification , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[17] Jitendra Malik,et al. Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.