Part-of-speech studies in Chinese

ABSTRACT This paper studies parts of speech in Chinese on data taken from the Modern Chinese Dictionary (5th edition). First, the part-of-speech polyfunctionality (ambiguity) of words is determined; then the corresponding distribution and rank-frequency sequence are analysed. The Waring and right truncated modified Zipf-Alekseev distributions are successfully fitted to the data. Second, the 121 patterns that the total 3742 polyfunctional words yield are presented. The polyfunctionality of patterns distributes according to the positive Cohen-binomial distribution, while the rank-frequency sequence abides by the negative hypergeometric distribution. Third, we discuss the mechanism behind the polyfunctionality phenomenon: Chinese words diversify in the dimension of their function but not in their form as can be expected from an analytic language. The Popescu-Altmann function captures the distribution of the variants of each part of speech. Fourth, we analyse the polyfunctionality distributions of individual parts of speech. Out of the 12 parts of speech which the dictionary distinguishes, six can be modelled by the Poisson distribution, four by the mixed Poisson, and two by the Singh-Poisson distribution. In order to obtain a general form, we apply the mixed Poisson distribution for all the parts of speech by controlling one parameter. We make a first attempt to plot the polyfunctionality distributions of individual parts of speech in Ord’s system, which surprisingly shows approximately a hyperbola.