Evolving GATE to meet new challenges in language engineering

In this paper we present recent work on GATE, a widely-used framework and graphical development environment for creating and deploying Language Engineering components and resources in a robust fashion. The GATE architecture has facilitated the development of a number of successful applications for various language processing tasks (such as Information Extraction, dialogue and summarisation), the building and annotation of corpora and the quantitative evaluations of LE applications. The focus of this paper is on recent developments in response to new challenges in Language Engineering: Semantic Web, integration with Information Retrieval and data mining, and the need for machine learning support.

[1]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[2]  Kalina Bontcheva,et al.  Multimedia indexing through multi-source and multi-language information extraction: the MUMIS project , 2004, Data Knowl. Eng..

[3]  Diana Maynard,et al.  Using parallel texts to improve recall in IE. , 2004 .

[4]  Stefan Decker,et al.  Creating Semantic Web Contents with Protégé-2000 , 2001, IEEE Intell. Syst..

[5]  Patrizia Paggio,et al.  Validating the TEMAA LE evaluation methodology: a case study on Danish spelling checkers , 1998, Nat. Lang. Eng..

[6]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[7]  Ralph Grishman,et al.  The American National Corpus: A Standardized Resource for American English , 2000, LREC.

[8]  Ian Witten,et al.  Data Mining , 2000 .

[9]  Kalina Bontcheva,et al.  Using HLT for Acquiring, Retrieving and Publishing Knowledge in AKT , 2001, HTLKM@ACL.

[10]  Branimir Boguraev,et al.  The Talent system: TEXTRACT architecture and data model , 2004, Natural Language Engineering.

[11]  Peter Buneman,et al.  Towards a Query Language for Annotation Graphs , 2000, LREC.

[12]  Steven Bird,et al.  Models and Tools for Collaborative Annotation , 2002, LREC.

[13]  HAMISH CUNNINGHAM,et al.  Software architecture for language engineering , 2000 .

[14]  L. Stein,et al.  OWL Web Ontology Language - Reference , 2004 .

[15]  Paul Buitelaar,et al.  Linguistic Annotation for the Semantic Web , 2003 .

[16]  Chris Mellish,et al.  A Reference Architecture for Generation Systems , 2004, Natural Language Engineering.

[17]  Kalina Bontcheva,et al.  A Unicode-based Environment for Creation and Use of Language Resources , 2002, LREC.

[18]  Steven P. Abney Partial parsing via finite-state cascades , 1996, Natural Language Engineering.

[19]  Dieter Fensel,et al.  Towards the Semantic Web: Ontology-driven Knowledge Management , 2002 .

[20]  Steve Cassidy,et al.  XQuery as an Annotation Query Language: a Use Case Analysis , 2002, LREC.

[21]  Oliver Christ,et al.  A Modular and Flexible Architecture for an Integrated Corpus Query System , 1994, ArXiv.

[22]  Lynette Hirschman,et al.  Mixed-Initiative Development of Language Processing Systems , 1997, ANLP.

[23]  Stefan Evert,et al.  The NITE XML Toolkit: Flexible annotation for multimodal language data , 2003, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[24]  Steffen Staab,et al.  S-CREAM: Semiautomatic CREAtion of Metadata , 2002, SAAKM@ECAI.

[25]  Dan Brickley,et al.  Resource Description Framework (RDF) Model and Syntax Specification , 2002 .

[26]  Gobinda G. Chowdhury,et al.  Spinning the Semantic Web: Bringing the World Wide Web to Its Full Potential , 2004 .

[27]  Jan O. Pedersen,et al.  An object-oriented architecture for text retrieval , 1991, RIAO.

[28]  Dieter Fensel,et al.  Ontologies: A silver bullet for knowledge management and electronic commerce , 2002 .

[29]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[30]  Nancy Ide,et al.  XCES: An XML-based Encoding Standard for Linguistic Corpora , 2000, LREC.

[31]  Nancy Ide,et al.  International Standard for a Linguistic Annotation Framework , 2003, Natural Language Engineering.

[32]  Ulrich Schäfer,et al.  WHAT: An XSLT-based Infrastructure for the Integration of Natural Language Processing Components , 2003, HLT-NAACL 2003.

[33]  Yorick Wilks,et al.  Named Entity Recognition from Diverse Text Types , 2001 .

[34]  S NeffMary,et al.  The Talent system: TEXTRACT architecture and data model , 2004 .

[35]  Gregory R. Crane,et al.  Cultural Heritage Digital Libraries: Needs and Components , 2002, ECDL.

[36]  Kalina Bontcheva,et al.  Robust Generic and Query-based Summarization , 2003, EACL.

[37]  Dieter Fensel,et al.  Towards the Semantic Web , 2002 .

[38]  Hamish Cunningham,et al.  GATE-a General Architecture for Text Engineering , 1996, COLING.

[39]  Atanas Kiryakov,et al.  KIM – a semantic platform for information extraction and retrieval , 2004, Natural Language Engineering.

[40]  David McKelvie,et al.  Hyperlink semantics for standoff markup of read-only documents , 1997 .

[41]  Antonio Badia,et al.  Ontologies , 2001, Springer Berlin Heidelberg.

[42]  Yorick Wilks,et al.  Software Infrastructure for Natural Language Processing , 1997, ANLP.

[43]  Atanas Kiryakov,et al.  KIM - Semantic Annotation Platform , 2003, SEMWEB.

[44]  Kalina Bontcheva,et al.  Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content , 2002, ECDL.

[45]  Mark Liberman,et al.  ATLAS: A Flexible and Extensible Architecture for Linguistic Annotation , 2000, LREC.

[46]  Kalina Bontcheva,et al.  Open-source Tools for Creation, Maintenance, and Storage of Lexical Resources for Language Generation from Ontologies , 2004, LREC.

[47]  Andrei Mikheev,et al.  A Workbench for Finding Structure in Texts , 1997, ANLP.