Extending an Information Extraction tool set to Central and Eastern European languages

Abstract In a highly multilingual and multi-cultural environment such as in the European Commission with soon over twenty official languages, there is an ur-gent need for text analysis tools that use minimal linguistic knowledge so that they can be adapted to many languages without much human effort. We are pre-senting two such Information Extraction tools that have already been adapted to various Western and Eastern European languages: one for the recognition of date expressions in text, and one for the detection of geographical place names and the visualisation of the results in geographical maps. An evaluation of the performance has produced very satisfy-ing results. 1 Introduction The international staff of the European Commis-sion (EC), like any other multinational organisa-tion, has to deal with documents written in many different languages. Multilingual text analysis tools can help them to be more efficient and to get access to information written in documents they may not understand. However, not many commercial text analysis tools exist that can ana-lyse texts in all official European Union (EU) languages, and we do not know of any tool that covers all of the over 20 languages that will be used after the planned Enlargement of the EU. The