Multilingual Video Indexing and Retrieval Employing an Information Extraction Tool for Turkish News Texts: A Case Study

In this paper, a multilingual video indexing and retrieval system is proposed which relies on an information extraction tool, a hybrid named entity recognizer, for Turkish to determine the semantic annotations for the considered videos. The system is executed on a set of news videos in English and encompasses several other components including an automatic speech recognition system for English, an English-to-Turkish machine translation system, a news video database, and a semantic video retrieval interface. The performance evaluation demonstrates that the system components achieve promising results which provides evidence for the applicability of the system. The proposed system and its application on the video set are significant as they constitute a plausible case study targeting at the problem of multilingual video indexing and retrieval utilizing information extraction as the central technique for semantic video indexing.

[1]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[2]  Jianqiang Wang,et al.  User-assisted query translation for interactive cross-language information retrieval , 2008, Inf. Process. Manag..

[3]  Adnan Yazici,et al.  Exploiting information extraction techniques for automatic semantic video indexing with an application to Turkish news videos , 2011, Knowl. Based Syst..

[4]  Ruslan Mitkov,et al.  The Oxford handbook of computational linguistics , 2003 .

[5]  Adnan Yazici,et al.  A text-based fully automated architecture for the semantic annotation and retrieval of Turkish news videos , 2010, International Conference on Fuzzy Systems.

[6]  Adnan Yazici,et al.  Lattice Parsing to Integrate Speech Recognition and Rule-Based Machine Translation , 2009, EACL.

[7]  Alexander I. Rudnicky,et al.  Pocketsphinx: A Free, Real-Time Continuous Speech Recognition System for Hand-Held Devices , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[8]  Frédérique Segond,et al.  Multilingual On-Line Natural Language Processing , 2005 .

[9]  Adnan Yazici,et al.  Named Entity Recognition Experiments on Turkish Texts , 2009, FQAS.

[10]  Raj Reddy,et al.  Automatic Speech Recognition: The Development of the Sphinx Recognition System , 1988 .

[11]  Adnan Yazici,et al.  A Hybrid Named Entity Recognizer for Turkish with Applications to Different Text Genres , 2010, ISCIS.

[12]  Dayne Freitag,et al.  Machine Learning for Information Extraction in Informal Domains , 2000, Machine Learning.

[13]  Hervé Le Borgne,et al.  SemanticVox: a multilingual video search engine , 2007, CIVR '07.