Developing Software for Corpus Research

Despite the central role of the computer in corpus research, programming is generally not seen as a core skill within corpus linguistics. As a consequence, limitations in software for text and corpus analysis slow down the progress of research while analysts often have to rely on third party software or even manual data analysis if no suitable software is available. Apart from software itself, data formats are also of great importance for text processing. But again, many practitioners are not very aware of the options available to them, and thus idiosyncratic text formats often make sharing of resources difficult if not impossible. This article discusses some issues relating to both data and processing which should aid researchers to become more aware of the choices available to them when it comes to using computers in linguistic research. It also describes an easy way towards automating some common text processing tasks that can easily be acquired without knowledge of actual computer programming.