We describe a web-based corpus query system, Glossa, which combines the expressiveness of regular query languages with the user-friendliness of a graphical interface. Since corpus users are usually linguists with little interest in technical matters, we have developed a system where the user need not have any prior knowledge of the search system. Furthermore, no previous knowledge of abbreviations for metavariables such as part of speech and source text is needed. All searches are done using checkboxes, pull-down menus, or writing simple letters to make words or other strings. Querying for more than one word is simply done by adding an additional query box, and for parts of words by choosing a feature such as start of word. The Glossa system also allows a wide range of viewing and post-processing options. Collocations can be viewed and counted in a number of ways, and be viewed as different kinds of graphical charts. Further annotation and deletion of single results for further processing is also easy. The Glossa system is already in use for a number of corpora. Corpus administrators can easily adapt the system to a wide range of corpora, including multilingual corpora and corpora with audio and video content.
[1]
Janne Bondi Johannessen,et al.
SearchTree - a userfriendly treebank search interface
,
2004
.
[2]
Sebastian Hoffmann.
BNCweb (CQP edition) - the marriage of two corpus tools.
,
2006
.
[3]
Eckhard Bick.
CorpusEye: Et brugervenligt web-interface for grammatisk opmærkede korpora
,
2005
.
[4]
Janne Bondi Johannessen,et al.
An Advanced Speech Corpus for Norwegian
,
2007,
NODALIDA.
[5]
Oliver Christ,et al.
A Modular and Flexible Architecture for an Integrated Corpus Query System
,
1994,
ArXiv.
[6]
Janne Bondi Johannessen,et al.
A Web-based Advanced and User Friendly System: The Oslo Corpus of Tagged Norwegian Texts
,
2000,
LREC.