Universities of Leeds, Sheffield and York

This paper presents GATE Teamware—an open-source, web-based, collaborative text annotation framework. It enables users to carry out complex corpus annotation projects, involving distributed annotator teams. Different user roles are provided (annotator, manager, administrator) with customisable user interface functionalities, in order to support the complex workflows and user interactions that occur in corpus annotation projects. Documents may be pre-pro- cessed automatically, so that human annotators can begin with text that has already been pre-annotated and thus making them more efficient. The user interface is simple to learn, aimed at non-experts, and runs in an ordinary web browser, without need of additional software installation. GATE Teamware has been evaluated through the creation of several gold standard corpora and internal projects, as well as through external evaluation in commercial and EU text annotation projects. It is

[1]  Jean Carletta,et al.  The NITE XML Toolkit: Data Model and Query Language , 2005, Lang. Resour. Evaluation.

[2]  Peter Berck,et al.  ANNEX - a web-based Framework for Exploiting Annotated Media Resources , 2006, LREC.

[3]  Nancy Ide,et al.  Standards for Language Resources , 2002, LREC.

[4]  Robert Parker,et al.  Annotation Tool Development for Large-Scale Corpus Creation Projects at the Linguistic Data Consortium , 2008, LREC.

[5]  Laurel D. Riek,et al.  Callisto: A Configurable Annotation Workbench , 2004, LREC.

[6]  Hennie Brugman,et al.  Collaborative Annotation of Sign Language Data with Peer-to-Peer Technology , 2004, LREC.

[7]  Hakluyt's Voyages,et al.  Annotation , 1936, Glasgow Medical Journal.

[8]  Kalina Bontcheva,et al.  GATECloud.net: a platform for large-scale, open-source text processing on the cloud , 2013, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[9]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[10]  Kalina Bontcheva,et al.  GATE: an Architecture for Development of Robust HLT applications , 2002, ACL.

[11]  Nancy Ide,et al.  Integrating Linguistic Resources: The American National Corpus Model , 2006, LREC.

[12]  Kalina Bontcheva,et al.  A Unicode-based Environment for Creation and Use of Language Resources , 2002, LREC.

[13]  Christoph Müller,et al.  Multi-level annotation of linguistic data with MMAX 2 , 2006 .

[14]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[15]  Stephanie Strassel,et al.  Annotation Tools for Large-Scale Corpus Development: Using AGTK at the Linguistic Data Consortium , 2004, LREC.

[16]  Hennie Brugman,et al.  Annotating Multi-media/Multi-modal Resources with ELAN , 2004, LREC.

[17]  Caitlin Murphy,et al.  Towards Evaluating the Impact of Semantic Support for Curating the Fungus Scientic Literature , 2011, CSWS.

[18]  Udo Kruschwitz,et al.  Markup Infrastructure for the Anaphoric Bank: Supporting Web Collaboration , 2012, Modeling, Learning, and Processing of Text Technological Data Structures.

[19]  Kalina Bontcheva,et al.  Large-scale, parallel automatic patent annotation , 2008, PaIR '08.

[20]  Wim Peters,et al.  SPRAT : a tool for automatic semantic pattern-based ontology population , 2009 .

[21]  Lynette Hirschman,et al.  Mixed-Initiative Development of Language Processing Systems , 1997, ANLP.

[22]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[23]  Philip V. Ogren,et al.  Knowtator: A Protégé plug-in for annotated corpus construction , 2006, NAACL.