Event-based MultiMedia Search and Retrieval for Question Answering

User generated content, available in massive amounts on the Internet, comes in many "flavors" (i.e. micro messages, text documents, images and videos) and is receiving increasing attention due to its many potential applications. One important applications is the automatic generation of multimedia enrichments concerning users topic of interests and in particular the creation of event summaries using multimedia data [1]. In this talk, an event-based cross media question answering system, which retrieves and summarizes events on a given user generated query topic is proposed. A framework for leveraging social media data to extract and illustrate social events automatically on any given query will be presented. The system operates in three stages. First, the input query is parsed semantically to identify the topic, location, and time information related to the event of interest (News in this scenario presented here). Then, we use the parsed information to mine the latest and hottest related News from social news web services. Third, to identify a unique event, we model the News content by latent Dirichlet Allocation and cluster the News using the DBSCAN algorithm. In the end, for each event, we retrieve both textual and visual content of News that refer the same event [2,3]. The resulting documents are shown within a vivid interface featuring both event description, tag cloud and photo collage [4]. Popular question answering systems (i.e. YahooAnswers) and search engines retrieve documents on the basis of text information. The integration the visual information within the text-based search for video and image retrieval is still a hot research topic. In the second part of this talk, we propose to use visual information to enrich the classic text-based search for video retrieval [5]. With the proposed framework, we endeavor to show experimentally, on a set of real world scenarios, that visual cues can effectively contribute to significant quality improvement of image/video retrieval. Experimental results show that mapping text-based queries to visual concepts improves the performance of the search system. Moreover, when appropriately selecting the relevant visual concepts for a query, a very substantial improvement of the system's performance is achieved [6]. Based on the various results presented in this talk we argue that question answering (among other application) can greatly leverage from cross media analysis to the benefit of users.