Hierarchical Latent Variable Models for Story Analysis of TV Dramas

With the advancement of statistical machine learning, various machine learning methods have been applied to dynamic analysis of multimodal streams. However, previous studies have limitations for tackling various real-world streams because they focus on utilizing very limited characteristics of certain domains such as repetition of fixed frames. In this paper, we introduce a generative model-based segmenting method in which a story segment of a video stream is estimated through the likelihood of a given model to explain incoming data without requiring prior knowledge. There exists a profound question of how to compare each segment's latent structure parameters. In the proposed model, this difficulty is circumvented by computing likelihood of a new frame given a story model. We apply the proposed method to distinguishing several story segments in a TV drama episode. We employ LDA (Latent Dirichlet Allocation) framework for generating a story segment model. The proposed method is validated by comparing its results with those of human estimation.