CQPweb — combining power, flexibility and usability in a corpus analysis tool

CQPweb is a new web-based corpus analysis system, intended to address the conflicting requirements for usability and power in corpus analysis software. To do this, its user interface emulates the BNCweb system. Like BNCweb, CQPweb is built on two separate query technologies: the IMS Open Corpus Workbench and the MySQL relational database. CQPweb’s main innovative feature is its flexibility; its more generalised data model makes it compatible with any corpus. The analysis options available in CQPweb include: concordancing; collocations; distribution tables and charts; frequency lists; and keywords or key tags. An evaluation of CQPweb against criteria earlier laid down for a future web-based corpus analysis tool suggests that it fulfils many, but not all, of the requirements foreseen for such a piece of software. Despite some limitations, in making a sophisticated query system accessible to untrained users, CQPweb combines ease of use, power and flexibility to a very high degree.

[1]  Tony McEnery,et al.  Corpus Linguistics: Method, Theory and Practice , 1996 .

[2]  Paul Rayson,et al.  Corpus Tools and Methods, Today and Tomorrow: Incorporating Linguists' Manual Annotations , 2008, Lit. Linguistic Comput..

[3]  Laurence Anthony,et al.  AntConc: A Learner and Classroom Friendly, Multi-Platform Corpus Analysis Toolkit , 2004 .

[4]  DAVID ABERCROMBIE Pseudo-Procedures in Linguistics , 1963 .

[5]  Sebastian Hoffmann BNCweb (CQP edition) - the marriage of two corpus tools. , 2006 .

[6]  Anthony McEnery,et al.  The UCREL Semantic Analysis System , 2004 .

[7]  Douglas Biber,et al.  Variation across speech and writing: Methodology , 1988 .

[8]  Oliver Mason Programming for corpus linguistics , 2000 .

[9]  Oliver Christ,et al.  A Modular and Flexible Architecture for an Integrated Corpus Query System , 1994, ArXiv.

[10]  Mark Davies,et al.  More than a peephole: Using large and diverse online corpora , 2010 .

[11]  Paul Rayson,et al.  From key words to key semantic domains , 2008 .

[12]  Adam Kilgarriff,et al.  The Sketch Engine , 2004 .

[13]  Martin Weisser Essential Programming for Linguistics , 2010, Computational Linguistics.

[14]  Mark Davies,et al.  The advantage of using relational databases for large corpora: Speed, advanced queries, and unlimited annotation , 2005 .

[15]  Guy Aston,et al.  The BNC Handbook: Exploring the British National Corpus with SARA , 1998 .

[16]  Paul Baker The BE06 Corpus of British English and recent language change , 2009 .

[17]  David Lee,et al.  Corpus Linguistics with BNCweb - a Practical Guide , 2008, English corpus linguistics.

[18]  Antoinette Renouf,et al.  WebCorp: providing a renewable data source for corpus linguists , 2003 .

[19]  Mark Davies The 385+ million word Corpus of Contemporary American English (1990―2008+): Design, architecture, and linguistic insights , 2009 .

[20]  Stefan Evert,et al.  Twenty-first century Corpus Workbench: Updating a query architecture for the new millennium , 2011 .