A Probabilistic Multimedia Retrieval Model and Its Evaluation

We present a probabilistic model for the retrieval of multimodal documents. The model is based on Bayesian decision theory and combines models for text-based search with models for visual search. The textual model is based on the language modelling approach to text retrieval, and the visual information is modelled as a mixture of Gaussian densities. Both models have proved successful on various standard retrieval tasks. We evaluate the multimodal model on the search task of TREC′s video track. We found that the disclosure of video material based on visual information only is still too difficult. Even with purely visual information needs, text-based retrieval still outperforms visual approaches. The probabilistic model is useful for text, visual, and multimedia retrieval. Unfortunately, simplifying assumptions that reduce its computational complexity degrade retrieval effectiveness. Regarding the question whether the model can effectively combine information from different modalities, we conclude that whenever both modalities yield reasonable scores, a combined run outperforms the individual runs.

[1]  Jean-Luc Gauvain,et al.  Transcribing broadcast news for audio and video indexing , 2000, CACM.

[2]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[3]  David A. Forsyth,et al.  Learning the semantics of words and pictures , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[4]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[5]  Penelope Sibun,et al.  A Practical Part-of-Speech Tagger , 1992, ANLP.

[6]  Djoerd Hiemstra,et al.  A Linguistically Motivated Probabilistic Model of Information Retrieval , 1998, ECDL.

[7]  Djoerd Hiemstra,et al.  Lazy Users and Automatic Video Retrieval Tools in (the) Lowlands , 2001, TREC.

[8]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[9]  Paul Over,et al.  The TREC-2002 Video Track Report , 2002, TREC.

[10]  Steve Young,et al.  The video mail retrieval project: experiences in retrieving spoken documents , 1997 .

[11]  Djoerd Hiemstra,et al.  Language-Based Multimedia Information Retrieval , 2000, RIAO.

[12]  Thijs Westerveld,et al.  Image Retrieval: Content versus Context , 2000, RIAO.

[13]  S. Sclaroff,et al.  Combining textual and visual cues for content-based image retrieval on the World Wide Web , 1998, Proceedings. IEEE Workshop on Content-Based Access of Image and Video Libraries (Cat. No.98EX173).

[14]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[15]  Djoerd Hiemstra,et al.  Using language models for information retrieval , 2001 .

[16]  Thijs Westerveld,et al.  CWI at the TREC 2002 Video Track , 2002, TREC.

[17]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[18]  Kenney Ng A Maximum Likelihood Ratio Information Retrieval Model , 1999, TREC.

[19]  Nuno Vasconcelos,et al.  Bayesian models for visual information retrieval , 2000 .

[20]  Thijs Westerveld Probabilistic multimedia retrieval , 2002, SIGIR '02.