Cross-Language Information Access through Phrase Browsing

This paper presents a cross-language retrieval system which integrates shallow parsing and lexical semantic databases in an interactive approach to information access. At indexing time, the system extracts a list of phrases for every language in the collection. At search time, the system bridges the gap between the user's query and the relevant phrases in the collection in any language, expanding and translating individual terms and retaining the phrases that are actually relevant in the collection. The user can access information via a standard ranked list of documents or via a hierarchy of phrasal information, in which the selection of a phrase modifies the ranked list and provides access to the documents related to the phrase. This interactive setting, to our belief, optimises the use of simple and robust Natural Language resources and techniques to facilitate crosslanguage information access.

[1]  Mark W. Davis,et al.  Getting Information from Documents You Cannot Read: An Interactive Cross-Language Text Retrieval and Summarization System , 1999 .

[2]  Didier Bourigault,et al.  Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases , 1992, COLING.

[3]  Mark S. Staveley,et al.  Phrasier: a system for interactive document retrieval using keyphrases , 1999, SIGIR '99.

[4]  Piek T. J. M. Vossen,et al.  Introduction to EuroWordNet , 1998, Comput. Humanit..

[5]  Marti A. Hearst,et al.  Reexamining the cluster hypothesis: scatter/gather on retrieval results , 1996, SIGIR '96.

[6]  Douglas W. Oard Evaluating Interactive Cross-Language Information Retrieval: Document Selection , 2000, CLEF.

[7]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[8]  Carol Peters,et al.  Cross-Language Information Retrieval and Evaluation , 2001, Lecture Notes in Computer Science.

[9]  Jan O. Pedersen,et al.  Phrase recognition and expansion for short, precision-biased queries based on a query log , 1999, SIGIR '99.

[10]  Sophia Ananiadou,et al.  The C-value/NC-value domain-independent method for multi-word term extraction , 1999 .

[11]  Julio Gonzalo,et al.  Monolingual and bilingual dictionary approaches to the enrichment of the Spanish WordNet with adjectives , 2001 .

[12]  W. Bruce Croft,et al.  Resolving ambiguity for cross-language information retrieval: a dictionary approach , 2001 .

[13]  Julio Gonzalo,et al.  Corpus-based terminology extraction applied to information access , 2001 .

[14]  Peter G. Anick,et al.  The paraphrase search assistant: terminological feedback for iterative information seeking , 1999, SIGIR '99.