Adjective Density as a Text Formality Characteristic for Automatic Text Classification: A Study Based on the British National Corpus

In this article, we report significant findings resulting from an investigation into the correlation between adjective density, calculated as the proportion of adjectives in word tokens, and degrees of text formality as part of an attempt to examine the potential application of adjectives in automatic text classification and identification. Correlations obtained from the training corpus will be compared with human ranking of the text categories concerned in the study and then adapted to unseen data in the test set. A linear regression analysis suggests a strong correlation between degrees of text formality and adjective density. With a weighted average F-measure of 0.606 achieved by a Naive Bayes classifier, the research establishes adjectives as a powerful differentia of text categories amongst the open word classes, an important feature that has been generally ignored by past studies in automatic text categorization. The empirical findings suggest that the use of adjective density will lead to enhanced practical systems for automatic text classification.