A hierarchical and multi-modal based algorithm for lead detection and news program narrative parsing

In this paper, a hierarchical and multi-modal based news item detection algorithm, which can be viewed as a mid-stage solution between the single-modal and the semantic-based approaches, is proposed for parsing TV news program videos. We investigate the production model of TV news program first and then make use of the so-obtained domain knowledge to develop the proposed algorithm. With the add of multi-modal features, such as volume and zero crossing rate in audios and keyframe and human face in videos, the proposed algorithm showed rather satisfactory results in both precision and recall measures for parsing a 6-hour news program test video.