Shot-level description and matching of video content

This paper discusses representational issues for the description and matching of video content at the level of the individual shot, using a thesaurus of keywords representing people, places and actions. We argue that a correct representation of shots must include temporal segments with roles such as location, character, camera and action, which can be filled by keywords.