An Effective News Anchorperson Shot Detection Method Based on Adaptive Audio/Visual Model Generation

A multi-modal method to improve the performance of the anchorperson shot detection for news story segmentation is proposed in this paper. The anchorperson voice information is used for the verification of anchorperson shot candidates extracted by visual information. The algorithm starts with the anchorperson voice shot candidate extraction using time and silence condition. The anchorperson templates are generated from the anchorperson face and cloth information from the anchorperson voice shots extracted. The anchorperson voice models are then created after segregating anchorperson voice shots containing 2 or more voices. The anchorperson voice model verifies the anchorperson shot candidates obtained from visual information. 720 minutes of news programs are tested and experimental results are demonstrated.

[1]  Gao Xinbo,et al.  A graph-theoretical clustering based anchorperson shot detection for news video indexing , 2003, Proceedings Fifth International Conference on Computational Intelligence and Multimedia Applications. ICCIMA 2003.

[2]  E. Land,et al.  Lightness and retinex theory. , 1971, Journal of the Optical Society of America.

[3]  古井 貞煕,et al.  Digital speech processing, synthesis, and recognition , 1989 .

[4]  Alberto Del Bimbo,et al.  Content-based indexing and retrieval of TV news , 2001, Pattern Recognit. Lett..

[5]  Yihong Gong,et al.  Automatic parsing of news video , 1994, 1994 Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[6]  Dong-Seok Jeong,et al.  Storyboard construction using segmentation of MPEG encoded news video , 2000, Proceedings of the 43rd IEEE Midwest Symposium on Circuits and Systems (Cat.No.CH37144).

[7]  Guodong Guo,et al.  Pairwise face recognition , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[8]  H. Kato,et al.  Automatic anchorperson detection from an MPEG coded TV program , 2002, 2002 Digest of Technical Papers. International Conference on Consumer Electronics (IEEE Cat. No.02CH37300).

[9]  Alan Hanjalic,et al.  Template-based detection of anchorperson shots in news programs , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[10]  Stan Z. Li,et al.  Learning to detect multi-view faces in real-time , 2002, Proceedings 2nd International Conference on Development and Learning. ICDL 2002.