An Adaptive and Distributed Framework for Advanced IR

It has been often noticed that modern IR ((Gregory, 1991), (Alan, 1991)) should exhibit capabilities that are sensitive to the document content, integrate interactivity, multimodality and multilinguality over a large scale and support the very dynamic nature of the current needs for information access (so to be adaptable to chanes of the sources, language and content/style). This paper discuss the architectural design aspects of TREVI (Text Retrieval and Enrichment for Vital Information - ESPRIT project EP23311), a distributed Object-Oriented Java/CORBA driven system for NLP-driven news classification, enrichment and delivery. The advanced features of TREVI include the extensive use of a well defined model ((Mazzucchelli, 1999)) based on a typed mechanism for static/dynamic control of the distributed process and on a principled representation of linguistic types into computational OO data structures and the adaptivity of the employed linguistic p rocessors (namely, the robust and lexically-driven parser). The original aspect of TREVI is its novel combination of a systematic design approach with the contribution of advanced and adaptive NLP processors for content-driven text classification. A full toolkit system was developed within the operational scenarios related to three different users (i.e. three different information providers), in two languages (English and Spanish) and its good performances (classification accuracy and usability) are basic evidences of the success of this approach.

[1]  L. Mazzuccelli,et al.  A model for Java/CORBA and OODBMS distributed architectures , 1999, Proceedings of the International Symposium on Distributed Objects and Applications.

[2]  John D. Lafferty,et al.  A Robust Parsing Algorithm for Link Grammars , 1995, IWPT.

[3]  Roberto Basili,et al.  Automatic Adaptation of WordNet to Sublanguages and to Computational Tasks , 1998, WordNet@ACL/COLING.

[4]  Roberto Basili,et al.  Lexicalizing a shallow parser , 1999 .

[5]  Roberto Basili,et al.  Language sensitive text classification , 2000, RIAO.

[6]  Luigi Mazzucchelli,et al.  A model for Java / CORBA & OODBMS distributed architectures , 1999 .

[7]  Dan Harkey,et al.  Client/Server programming with Java and Corba , 1997 .

[8]  Roberto Basili,et al.  Corpus-Driven Unsupervised Learning of Verb Subcategorization Frames , 1997, AI*IA.

[9]  Roberto Basili,et al.  Customizable Modular Lexicalized Parsing , 2000, IWPT.

[10]  Roberto Basili,et al.  Contextual Word Sense Tuning and Disambiguation , 1997, Appl. Artif. Intell..

[11]  Yorick Wilks,et al.  Software Infrastructure for Natural Language Processing , 1997, ANLP.

[12]  Roberto Basili,et al.  Representing Document Content via an Object-Oriented Paradigm , 1999, ISMIS.

[13]  Ismael Sanz,et al.  Distributed objects in a large scale text processing system (industrial case study) , 1999, Proceedings of the International Symposium on Distributed Objects and Applications.

[14]  Steven Abney,et al.  Part-of-Speech Tagging and Partial Parsing , 1997 .

[15]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[16]  Maria Teresa Pazienza,et al.  Information Extraction A Multidisciplinary Approach to an Emerging Information Technology , 1997, Lecture Notes in Computer Science.

[17]  Roberto Basili,et al.  A Shallow Syntactic Analyser to Extract Word Associations from Corpora , 1992 .

[18]  Zhonghua Yang,et al.  CORBA: A Platform for Distributed Object Computing (A State-of-the-Art Report on OMG/CORBA) , 1996 .

[19]  Rémi Zajac,et al.  An Open Distributed Architecture for Reuse and Integration of Heterogeneous NLP Components , 1997, ANLP.

[20]  Roberto Basili,et al.  An empirical approach to Lexical Tuning , 2000 .

[21]  Maria Teresa Pazienza Information Extraction: Towards Scalable, Adaptable Systems , 1999 .

[22]  Roberto Basili,et al.  Efficient Parsing for Information Extraction , 1998, ECAI.

[23]  Roberto Basili,et al.  Modelling Syntactic Uncertainty in Lexical Acquisition from Texts , 1994, J. Quant. Linguistics.

[24]  Jean-Pierre Chanod,et al.  Incremental Finite-State Parsing , 1997, ANLP.