What to Infer from a DescriptionTed

Recent work in identifying dependent collocations among consecutive words (i.e., dependent bigrams) has applied inferential statistical methods where descriptive ones may have been more appropriate and easier to use. In this paper we make the distinction between inferential and descriptive methods and discuss how and when each should be applied. Inferential methods are useful in that they allow one to generalize the results obtained from the analysis of a random data sample to a larger population. However, implicit in these methodologies are stringent requirements regarding the nature of the data sample. We show experimentally the eeect of not complying with these requirements in the context of dependent bigram identiication. We suggest that Fisher's exact test is the most appropriate test for making inferences from a random sample of bigram data since this test makes the fewest requirements regarding the nature of the data. When it is possible to exhaustively study a population then a descriptive approach is most appropriate. There is no need to infer the characteristics of a population when that population can be studied in its entirety. We introduce a new descriptive measure, minimum sensitivity, that can be used to identify dependent collocations in a population. Experimental results show that this descriptive measure compares favorably to previous approaches.