Hidden Markov model parsing of video programs

This paper introduces statistical parsing of video programs using hidden Markov models (HMMs). The fundamental units of a video program are shots and transitions (fades, dissolves, etc.). Those units are in turn used to create more complex structures, such as scenes. Parsing a video allows us to recognize higher-level story abstractions. These higher-level story elements can be used to create summarizations of the programs, to recognize the most important parts of a program, and many other purposes. The paper is of interest in cinematography for summarizing programs.