Natural Language Processing and Big Data - An Ontology-Based Approach for Cross-Lingual Information Retrieval

Extracting relevant information in multilingual context from massive amounts of unstructured, structured and semi-structured data is a challenging task. Various theories have been developed and applied to ease the access to multicultural and multilingual resources. This papers describes a methodology for the development of an ontology-based Cross-Language Information Retrieval (CLIR) application and shows how it is possible to achieve the translation of Natural Language (NL) queries in any language by means of a knowledge-driven approach which allows to semi-automatically map natural language to formal language, simplifying and improving in this way the human-computer interaction and communication. The outlined research activities are based on Lexicon-Grammar (LG), a method devised for natural language formalization, automatic textual analysis and parsing. Thanks to its main characteristics, LG is independent from factors which are critical for other approaches, i.e. interaction type (voice or keyboard-based), length of sentences and propositions, type of vocabulary used and restrictions due to users' idiolects. The feasibility of our knowledge-based methodological framework, which allows mapping both data and metadata, will be tested for CLIR by implementing a domain-specific early prototype system.

[1]  Masatoshi Yoshikawa,et al.  A combined statistical query term disambiguation in cross-language information retrieval , 2002, Proceedings. 13th International Workshop on Database and Expert Systems Applications.

[2]  Johanna Monti,et al.  Multi-word unit processing in machine translation. Developing and using language resources for multi-word unit processing in machine translation , 2015 .

[3]  W. Bruce Croft,et al.  Dictionary Methods for Cross-Lingual Information Retrieval , 1996, DEXA.

[4]  Annibale Elia,et al.  Lexicon-Grammar, Electronic Dictionaries and Local Grammars of Italian , 2004 .

[5]  Maurice Gross,et al.  Méthodes en syntaxe : régime des constructions complétives , 1978 .

[6]  Alon Lavie,et al.  Cross Lingual and Semantic Retrieval for Cultural Heritage Appreciation , 2007, LaTeCH@ACL 2007.

[7]  Zdenek Zdrahal,et al.  Facilitating cross-language retrieval and machine translation by multilingual domain ontologies , 2010 .

[8]  C. Habel,et al.  Language , 1931, NeuroImage.

[9]  Maurice Gross La construction de dictionnaires électroniques , 1989 .

[10]  Zellig S. Harris,et al.  A Grammar of English on Mathematical Principles , 1982 .

[11]  Max Silberztein,et al.  Dictionnaires électroniques et analyse automatique de textes : le système intex , 1993 .

[12]  W. Bruce Croft,et al.  Phrasal translation and query expansion techniques for cross-language information retrieval , 1997, SIGIR '97.

[13]  Maddalen Lopez de Lacalle,et al.  Dictionary and Monolingual Corpus-based Query Translation for Basque-English CLIR , 2010, LREC.

[14]  Masatoshi Yoshikawa,et al.  Query term disambiguation for Web cross-language information retrieval using a search engine , 2000, IRAL '00.

[15]  Federica Marano,et al.  Exploring formal models of linguistic data structuring. Enhanced solutions for knowledge management systems based on NLP applications , 2012 .

[16]  Changning Huang,et al.  Improving query translation for cross-language information retrieval using statistical models , 2001, SIGIR '01.

[17]  Mark W. Davis,et al.  Free Resources And Advanced Alignment For Cross-Language Text Retrieval , 1997, TREC.

[18]  Gregory Grefenstette,et al.  Querying across languages: a dictionary-based approach to multilingual information retrieval , 1996, SIGIR '96.

[19]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[20]  Mario Monteleone Lessicografia e dizionari elettronici. Dagli usi linguistici alle basi di dati lessicali , 2002 .

[21]  Aitor Soroa,et al.  Cross-lingual event-mining using wordnet as a shared knowledge interface , 2012 .

[22]  Maurice Gross,et al.  Grammaire transformationnelle du français : syntaxe du verbe , 1968 .

[23]  Zellig S. Harris Transformations in Linguistic Structure , 1970 .

[24]  Z. Harris Co-Occurrence and Transformation in Linguistic Structure , 1957 .

[25]  Paul Buitelaar,et al.  Ontologies in Cross-Language Information Retrieval , 2003, Wissensmanagement.

[26]  Douglas W. Oard,et al.  Multilingual Information Access , 2010 .

[27]  Ruslan Mitkov,et al.  CLIR- and ontology-based approach for bilingual extraction of comparable documents , 2012 .