Text-based search of TV news stories

Our goal is to enable viewers to access TV programs based on their content. Towards this end, we present a system that automatically captures and processes TV news programs into a database that can be searched over the internet. Users browse this database by submitting simple English queries. The results of the query is a hyperlinked list of matching news stories. Clicking on any item in the list immediately launches a video of the pertinent part of the news broadcast. We segment TV news broadcasts into distinct news stories. We then index each story as a separate entity. In reply to a query, videos for these news stories are displayed rather than the whole TV program. News program s ar usually accompanied by a transcript in closed caption text. The closed caption text contains markers for story boundaries. Due to the live nature of TV news programs, the closed caption lags the actual audio/video by varying amounts of time up to a few seconds. The closed caption text, thus, has to be shifted to be aligned in time to the video. We use video and audio events to do this synchronization. The closed caption for each story is entered into a database. In response to a query, the database retrieves and ranks the matching closed caption stores. An HTML document is returned to the user which lists: 1) the name and time of the news program that this story belongs to, 2) thumbnails providing a visual summary of the story, 3) closed caption text. To view a news story, the user simply clicks on an item form the list and the video for that story is streamed onto a media player at the user side. This system maintains the manner of presentation of the media, namely video for TV programs, while allowing the common search and selection techniques used on the web.