The ZT Corpus (Basque Corpus of Science and Technology) is a tagged collection of specialised texts in Basque, which aims to be a major resource in research and development with respect to written technical Basque: terminology, syntax and style. It was released in December 2006 and can be queried at http://www.ztcorpusa.net. The ZT Corpus stands out among other Basque corpora for many reasons: it is the first specialised corpus in Basque, it has been designed to be a methodological and functional reference for new projects in the future (i.e. a national corpus for Basque), it is the first corpus in Basque annotated using a TEI-P4 compliant XML format, it is the first written corpus in Basque to be distributed by ELDA and it has a friendly and sophisticated query interface. The corpus has two kinds of annotation, a structural annotation and a stand-off linguistic annotation. It is composed of two parts, a 1.6 million-word balanced part, whose annotation has been revised by hand, and another automatically tagged 6 million-word part. The project is not closed, and we have the intention to gradually enlarge the corpus, along with making improvements to it. We also present the technology and the tools used to build this corpus. These tools, Corpusgile and Eulia, provide a flexible and extensible infrastructure for creating, visualising and managing corpora, and for consulting, visualising and modifying annotations generated by linguistic tools. And finally we will be introducing the web interface to query the ZT Corpus, which offers some interesting advanced features that are new in Basque corpora.
[1]
Olatz Ansa,et al.
EDBL: a General Lexical Basis for the Automatic Processing of Basque
,
2006
.
[2]
Itziar Aduriz,et al.
EUSLEM: A Lemmatiser/Tagger for Basque
,
1996
.
[3]
Nancy Ide,et al.
International Standard for a Linguistic Annotation Framework
,
2003,
Natural Language Engineering.
[4]
N. Ezeiza,et al.
EULIA : a graphical web interface for creating , browsing and editing linguistically annotated corpora
,
2004
.
[5]
M. Teresa Cabré,et al.
El Corpus de l'IULA: descripció
,
1997
.