Automatic classification of TV news articles based on telop character recognition

The purpose of this study is to develop a multimedia database system for TV news video data. TV news video data consist of speech, characters and images. In this study, telop characters are recognized and given to the news articles as indices for classification. At first, telop frames which include telop characters are detected and then the telop characters are extracted and recognized. Through morphological analysis of the recognized telop characters, keywords are extracted which consist of more than two characters. Their keywords are used as indices to classify the TV news articles. We carried out experiments for 30 days of NHK 5 minutes news and obtained 95.4% telop character extraction rate, 81.4% character recognition rate and 83.8% article classification rate. We improved the article classification rate by 43.4% through character recognition improvement from 44.1% to 81.4%.

[1]  Shoji Kurakake,et al.  Telop Detection Method for Content-Based Video Data Retrieval , 1996 .

[2]  Erkki Oja,et al.  Subspace methods of pattern recognition , 1983 .

[3]  Yasuo Ariki,et al.  Indexing and classification of TV news articles based on telop recognition , 1997, Proceedings of the Fourth International Conference on Document Analysis and Recognition.

[4]  Yasuo Ariki,et al.  Segmentation and recognition of handwritten characters using subspace method , 1995, Proceedings of 3rd International Conference on Document Analysis and Recognition.