Robust Scene Recognition Using Language Models for Scene Contexts

We propose a robust scene recognition framework using scene context information for multimedia contents. Multimedia contents con-sist of scene sequences that are more likely to happen compared with other scene sequences. We employ a statistical approach to deal with this scene context information. We employ a hidden Markov model (HMM) to model each scene and n-gram language model to represent the contexts among scenes. We evaluated the proposed method in scene recognition experiments for 16 scenes in video data of 25 baseball games. The proposed method significantly improved the results compared to that without scene context information.

[1]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[2]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[3]  Akihisa Kodate,et al.  Sports video categorizing method using camera motion parameters , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[4]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[5]  HongJiang Zhang,et al.  Automatic parsing of TV soccer programs , 1995, Proceedings of the International Conference on Multimedia Computing and Systems.

[6]  Pascale Fung,et al.  The estimation of powerful language models from small and large corpora , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Shih-Fu Chang,et al.  Structure analysis of sports video using domain models , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[8]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[9]  Shih-Fu Chang,et al.  Algorithms and system for segmentation and structure analysis in soccer video , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[10]  Mei Han,et al.  Extract highlights from baseball game video with hidden Markov models , 2002, Proceedings. International Conference on Image Processing.

[11]  Mei Han,et al.  Maximum entropy model-based baseball highlight detection and classification , 2004, Comput. Vis. Image Underst..

[12]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[13]  Shiqiang Yang,et al.  An HMM-based framework for video semantic analysis , 2005, IEEE Trans. Circuits Syst. Video Technol..

[14]  George Saon,et al.  Data-driven approach to designing compound words for continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[15]  R. Brunelli,et al.  A Survey on the Automatic Indexing of Video Data, , 1999, J. Vis. Commun. Image Represent..

[16]  Nobuyuki Yagi,et al.  Baseball video indexing using patternization of scenes and hidden Markov model , 2005, IEEE International Conference on Image Processing 2005.

[17]  Adam Kilgarriff,et al.  WASP-Bench: an MT lexicographers’ workstation supporting state-of-the-art lexical disambiguation , 2001, MTSUMMIT.

[18]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[19]  Patrick Gros,et al.  Hierarchical structure analysis of sport videos using HMMS , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[20]  Wen-Huang Cheng,et al.  Baseball event detection using game-specific feature sets and rules , 2005, 2005 IEEE International Symposium on Circuits and Systems.

[21]  Shiqiang Yang,et al.  Motion based event recognition using HMM , 2002, Object recognition supported by user interaction for service robots.

[22]  Koichi Shinoda,et al.  Robust highlight extraction using multi-stream hidden Markov models for baseball video , 2005, IEEE International Conference on Image Processing 2005.

[23]  Noboru Babaguchi,et al.  Event based indexing of broadcasted sports video by intermodal collaboration , 2002, IEEE Trans. Multim..

[24]  Martial Hebert,et al.  A hierarchical field framework for unified context-based classification , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.