Do we Need Linguistics When We Have Statistics? A Comparative Analysis of the Contributions of Linguistic Cues to a Statistical Word Grouping System

We present a comparative analysis of the performance of a statistics-based system for the formation of semantic groups of adjectives when various sources of linguistic knowledge are introduced. We identify four different types of slufllow linguistic knowledge that are applicable to this system, and we quantify the performance gained by incorporating each such knowledge module, We perform experiments for different corpus sizes and different inputs (sets of adjectives to group), collect clam on the usaful.ness of each linguistic module, assess the statistical significance of the results, and compare the contributions of the linguistic knowledge sources against each other. We also assess the overall effect linguistic knowledge has in our system. Our results show that linguistic knowledge causes a significant increase in the performance of the system. We conclude by discussing how these positive restdts can be generalized to other problems in statistical NLP.