论文信息 - Accurate Methods for the Statistics of Surprise and Coincidence

Accurate Methods for the Statistics of Surprise and Coincidence

Much work has been done on the statistical analysis of text. In some cases reported in the literature, inappropriate statistical methods have been used, and statistical significance of results have not been addressed. In particular, asymptotic normality assumptions have often been used unjustifiably, leading to flawed results.This assumption of normal distribution limits the ability to analyze rare events. Unfortunately rare events do make up a large fraction of real text.However, more applicable methods based on likelihood ratio tests are available that yield good results with relatively small samples. These tests can be implemented efficiently, and have been used for the detection of composite terms and for the determination of domain-specific terms. In some cases, these measures perform much better than the methods previously used. In cases where traditional contingency table methods work well, the likelihood ratio tests described here are nearly identical.This paper describes the basis of a measure based on likelihood ratios that can be applied to the analysis of text.

Ted Dunning | T. Dunning

[1] J. Wolfowitz,et al. An Introduction to the Theory of Statistics , 1951, Nature.

[2] J. V. Bradley. Distribution-Free Statistical Tests , 1968 .

[3] H. J. Larson,et al. Introduction to the Theory of Statistics , 1973 .

[4] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[5] S. T. Dumais,et al. Using latent semantic analysis to improve access to textual information , 1988, CHI '88.

[6] Kenneth Ward Church,et al. Parsing, Word Associations and Typical Predicate-Argument Relations , 1989, HLT.

[7] Roger W. Schvaneveldt,et al. Pathfinder associative networks: studies in knowledge organization , 1990 .

[8] Roger W. Schvaneveldt,et al. Using pathfinder to extract semantic information from text , 1990 .

[9] John Cocke,et al. A Statistical Approach to Machine Translation , 1990, CL.

[10] Kenneth Ward Church,et al. A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.

[11] Kenneth Ward Church,et al. Identifying Word Correspondences in Parallel Texts , 1991, HLT.

[12] Kenneth Ward Church,et al. Identifying word correspondence in parallel texts , 1991 .