Knowledge-Based Multilingual Document Analysis

The growing availability of multilingual resources, like EuroWordnet, has recently inspired the development of large scale linguistic technologies, e.g. multilingual IE and Q&A, that were considered infeasible until a few years ago. In this paper a system for categorisation and automatic authoring of news streams in different languages is presented. In our system, a knowledge-based approach to Information Extraction is adopted as a support for hyperlinking. Authoring across documents in different languages is triggered by Named Entities and event recognition. The matching of events in texts is carried out by discourse processing driven by a large scale world model. This kind of multilingual analysis relies on a lexical knowledge base of nouns(i.e. the EuroWordnet Base Concepts) shared among English, Spanish and Italian lexicons. The impact of the design choices on the language independence and the possibilities it opens for automatic learning of the event hierarchy will be discussed.

[1]  Eneko Agirre,et al.  A Proposal for Word Sense Disambiguation using Conceptual Distance , 1995, ArXiv.

[2]  Barbara B. Levin,et al.  English verb classes and alternations , 1993 .

[3]  Piek T. J. M. Vossen,et al.  The Top-Down Strategy for Building EuroWordNet: Vocabulary Coverage, Base Concepts and Top Ontology , 1998, Comput. Humanit..

[4]  Kevin Humphreys,et al.  XI: A Simple Prolog-based Language for Cross-Classification and Inheritance , 1996 .

[5]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[6]  Julio Gonzalo,et al.  Indexing with WordNet synsets can improve text retrieval , 1998, WordNet@ACL/COLING.

[7]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[8]  Yorick Wilks,et al.  University of Sheffield: Description of the LaSIE System as Used for MUC-6 , 1995, MUC.

[9]  Roberto Basili,et al.  Corpus-Driven Unsupervised Learning of Verb Subcategorization Frames , 1997, AI*IA.

[10]  Roberto Basili,et al.  Customizable Modular Lexicalized Parsing , 2000, IWPT.

[11]  Lluís Padró,et al.  Mapping WordNets Using Structural Information , 2000, ACL.

[12]  Antonietta Alonge,et al.  The Top-Down Strategy for Building EuroWordNet: Vocabulary Coverage , 1998 .

[13]  Yorick Wilks,et al.  Multilingual Authoring: the NAMIC Approach , 2001, HTLKM@ACL.

[14]  Yorick Wilks,et al.  University of Sheffield: description of the LaSIE system as used for MUC-6 , 1995, MUC.

[15]  J Allan,et al.  Readings in information retrieval. , 1998 .