Introduction: Data and Methods in Computer-Mediated Discourse Analysis

Computer-mediated discourse (CMD) encompasses all kinds of interpersonal communication carried out on the Internet, e.g., by email, instant messaging, web discussion boards, and chat channels (Herring, 2001, 2004). In the last decade, CMD has attracted a great deal of research attention from linguistic̶especially pragmatic, discourse-analytic, and sociolinguistic̶ perspectives. However, methodological reflection is lagging behind compared to other areas of discourse studies. To begin with, while data collection on the Internet seems trivial at first sight, researchers conducting CMD studies are confronted with a variety of non-trivial questions. These may relate to the size and representativeness of data samples, data processing techniques, the delimitation of genres, and the kind and amount of contextual information that is necessary, as well as to ethical issues such as anonymity and privacy protection. Much research in the area has been based on small, ad-hoc data sets; there is a lack of standard guidelines for CMD corpus design and a lack of publicly-available CMD corpora (Beißwenger & Storrer, 2008).