Software Infrastructure for Language Resources: a Taxonomy of Previous Work and a Requirements Analysis

This paper presents a taxonomy of previous work on infrastructures, architectures and development environments for representing and processing Language Resources (LRs), corpora, and annotations. This classification is then used to derive a set of requirements for a Software Architecture for Language Engineering (SALE). The analysis shows that a SALE should address common problems and support typical activities in the development, deployment, and maintenance of LE software. The results will be used in the next phase of construction of an infrastructure for LR production, distribution, and access.

[1]  Ehud Reiter,et al.  Has a Consensus NL Generation Architecture Appeared, and is it Psycholinguistically Plausible? , 1994, INLG.

[2]  Ronald Rosenfeld,et al.  Statistical language modeling using the CMU-cambridge toolkit , 1997, EUROSPEECH.

[3]  Rémi Zajac,et al.  Towards Computer-Aided Linguistic Engineering , 1992, COLING.

[4]  Amy Isard,et al.  Towards a minimal standard for dialogue transcripts: a new SGML architecture for the HCRC map task corpus , 1998, ICSLP.

[5]  Remi Zajac Feature Structures, Unification and Finite-State Transducers , 1998 .

[6]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[7]  José Carlos González,et al.  ARIES: A lexical platform for engineering Spanish processing tools , 1997, Natural Language Engineering.

[8]  Günther Görz,et al.  Research on Architectures for Integrated Speech/Language Systems in Verbmobil , 1996, COLING.

[9]  Chris Brew,et al.  Using SGML as a Basis for Data-Intensive NLP , 1997, ANLP.

[10]  Andrei Mikheev,et al.  A Workbench for Finding Structure in Texts , 1997, ANLP.

[11]  J.-L. Koning,et al.  DAI interaction protocols as control strategies in a natural language processing system , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.

[12]  Paul Clements,et al.  Software Architecture: An Executive Overview , 1996 .

[13]  Mark Liberman,et al.  ATLAS: A Flexible and Extensible Architecture for Linguistic Annotation , 2000, LREC.

[14]  HAMISH CUNNINGHAM,et al.  Software architecture for language engineering , 2000 .

[15]  Christian Boitet,et al.  The “Whiteboard” Architecture: A Way to Integrate Heterogeneous Components of NLP Systems , 1994, COLING.

[16]  Richard Fikes,et al.  Distributed repositories of highly expressive reusable ontologies , 1999, IEEE Intell. Syst..

[17]  R. F. Brown,et al.  PERFORMANCE EVALUATION , 2019, ISO 22301:2019 and business continuity management – Understand how to plan, implement and enhance a business continuity management system (BCMS).

[18]  Alistair Cockburn,et al.  Structuring Use Cases with Goals , 2000 .

[19]  Yorick Wilks,et al.  TIPSTER-Compatible Projects at Sheffield , 1996, TIPSTER.

[20]  W. Von Hahn The architecture problem in natural language processing , 1994 .

[21]  Mark Liberman,et al.  Towards a formal framework for linguistic annotations , 1998, ICSLP.

[22]  Fred A. Cummins,et al.  TARO: an interactive, object-oriented tool for building natural language systems , 1989, [Proceedings 1989] IEEE International Workshop on Tools for Artificial Intelligence.

[23]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange , 1994 .

[24]  Ehud Reiter,et al.  Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[25]  Theodor Holm Nelson,et al.  Embedded Markup Considered Harmful , 1997, World Wide Web J..

[26]  F. Pianesi,et al.  Computational Environments for Grammar Development and Linguistic Engineering , 1997 .

[27]  Kilian Stoffel,et al.  Back-end technology for high-performance knowledge-representation systems , 1999, IEEE Intell. Syst..

[28]  Mark Liberman,et al.  A formal framework for linguistic annotation , 1999, Speech Commun..

[29]  Kathryn S. McKinley,et al.  Performance evaluation of a distributed architecture for information retrieval , 1996, SIGIR '96.

[30]  Julia Galliers,et al.  Evaluating natural language processing systems , 1995 .

[31]  Johan Bos,et al.  Managing Information at Linguistic Interfaces , 1998, ACL.

[32]  Ramana Rao,et al.  The information grid: a framework for information retrieval and retrieval-centered applications , 1992, UIST '92.

[33]  Will Tracz,et al.  Domain-specific software architecture (DSSA) frequently asked questions (FAQ) , 1994, SOEN.

[34]  Gregor Thurmair,et al.  An Architecture Sketch of Eurotra-II , 1991, MTSUMMIT.

[35]  David G. Hendry,et al.  An architecture for implementing extensible information-seeking environments , 1996, SIGIR '96.

[36]  Andreas Henrich Document retrieval facilities for repository-based system development environments , 1996, SIGIR '96.

[37]  Margaret King,et al.  Evaluating natural language processing systems , 1996, CACM.

[38]  Father Hacker No.116 News 11 @bullet Membership Report the Newsletter of the Society for the Study of Artificial Intelligence and Simulation of Behaviour , 2022 .

[39]  Fachbereich Informatik,et al.  ICE INTARC Communication Environment Users Guide and Reference Manual Version 1.4 , 1995 .

[40]  Richard S. Rosenberg,et al.  A data management strategy for transportable natural language interfaces , 1995, Int. J. Intell. Syst..

[41]  Dan Connolly,et al.  XML : principles, tools, and techniques , 1997 .

[42]  Diana Maynard,et al.  JAPE: a Java Annotation Patterns Engine , 2000 .

[43]  Chris Brew,et al.  Using SGML as a Basis for Data-Intensive Natural Language Processing , 1997, Comput. Humanit..

[44]  Oliver Christ,et al.  A Modular and Flexible Architecture for an Integrated Corpus Query System , 1994, ArXiv.

[45]  William H. Edmondson,et al.  A non-linear architecture for speech and natural language processing , 1994, ICSLP.

[46]  Jerry R. Hobbs The generic information extraction system , 1993, MUC.

[47]  C. Doran,et al.  ITRI-99-22 Achieving theory-neutrality in reference architectures for NLP : to what extent is it possible / desirable , 1999 .

[48]  Frantz Vichot,et al.  Producing NLP-based On-line Contentware , 1998, ArXiv.

[49]  Yorick Wilks,et al.  Experience with a Language Engineering Architecture: Three Years of GATE , 1999 .

[50]  Nancy Id,et al.  Encoding Linguistic Corpora , 1998, ACL 1998.

[51]  Edward Yourdon,et al.  Rise and Resurrection of the American Programmer , 1996 .

[52]  Neil Simpkins ALEP (Advanced Language Engineering Platform): an open architecture for language engineering , 1994 .

[53]  Mark Liberman,et al.  Annotation graphs as a framework for multidimensional linguistic data analysis , 1999, ArXiv.

[54]  Jan O. Pedersen,et al.  An object-oriented architecture for text retrieval , 1991, RIAO.

[55]  Patrizia Paggio,et al.  Validating the TEMAA LE evaluation methodology: a case study on Danish spelling checkers , 1998, Nat. Lang. Eng..

[56]  Yorick Wilks,et al.  Uniform language resource access and distribution , 1998 .

[57]  Li Li,et al.  A Test Environment for Natural Language Understanding Systems , 1998, COLING-ACL.

[58]  Nancy Priest-Dorman Greg Ide,et al.  Corpus Encoding Standard (CES) , 2000 .

[59]  Yorick Wilks,et al.  Software Infrastructure for Natural Language Processing , 1997, ANLP.

[60]  Yorick Wilks,et al.  GATE: an environment to support research and development in natural language engineering , 1996, Proceedings Eighth IEEE International Conference on Tools with Artificial Intelligence.

[61]  Stephan Busemann Constraint-Based Techniques for Interfacing Software Modules , 1999 .

[62]  Kathleen McKeown,et al.  Combining Multiple, Large-Scale Resources in a Reusable Lexicon for Natural Language Generation , 1998, COLING-ACL.

[63]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange : TEI P4 , 2002 .

[64]  Yves Demazeau,et al.  TALISMAN: A Multi-Agent System for Natuarl Language Processing , 1995, SBIA.

[65]  Charles F. Goldfarb,et al.  The XML Handbook , 1998 .

[66]  Y. Wilks,et al.  A General Architecture for Text Engineering (gate) { a New Approach to Language Engineering R&d a General Architecture for Text Engineering (gate) | a New Approach to Language Engineering R&d a E G T , 1995 .

[67]  Nancy Ide,et al.  Encoding Linguistic Corpora , 1998, VLC@COLING/ACL.

[68]  Nancy Ide,et al.  Corpues enconding standard: SGML guidelines for encoding linguistic corpora , 1998, LREC.

[69]  Yorick Wilks,et al.  Software Infrastructure for Language Engineering , 1996 .

[70]  Daniel S. Paiva July A Survey of Applied Natural Language Generation Systems , 1998 .

[71]  Yorick Wilks,et al.  New Methods, Current Trends and Software Infrastructure for NLP , 1996, ArXiv.

[72]  Charles F. Goldfarb,et al.  SGML handbook , 1990 .

[73]  Byung Suk Lee,et al.  Object databases for SGML document management , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[74]  Edward Yourdon,et al.  Modern structured analysis , 1989 .

[75]  Hamish Cunningham,et al.  GATE - a TIPSTER-based General Architecture for Text Engineering , 1997 .