A corpus study of variation in written Chinese

Abstract The present paper reports on the findings of a preliminary study of written Chinese, using the Lancaster Corpus of Mandarin Chinese (LCMC, McEnery & Xiao 2004). The first part of the paper introduces the stylistic features, and briefly describes the distributional patterns of these features across the selected written registers. Then, using a multi-feature, multi-dimensional framework (Biber 1988) and the data reduction method of correspondence analysis, three dimensions are identified and interpreted. The study reveals extensive linguistic variation across written Chinese registers, thus complementing previous observations about stylistic differences between spoken and written Chinese. Finally, issues concerning feature selection and dimension interpretation are discussed.