WordNet and Distributional Analysis: A Class-based Approach to Lexical Discovery

It has become common in statistical studies of natural language data to use measures of lexical association, such as the information-theoretic measure of mutual information, to extract useful relationships between words (e.g. [Church et al., 1989; Church and Hanks, 1989; Hindle, 1990]). For example, [Hindle, 1990] uses an estimate of mutual information to calculate what nouns a verb can take as its subjects and objects, based on distributions found within a large corpus of naturally occurring text.

[1]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[2]  Marti A. Hearst,et al.  Refining Automatically-Discovered Lexical Relations: Combining Weak Techniques for Stronger Results , 1992 .

[3]  R. Burchfield Frequency Analysis of English Usage: Lexicon and Grammar. By W. Nelson Francis and Henry Kučera with the assistance of Andrew W. Mackie. Boston: Houghton Mifflin. 1982. x + 561 , 1985 .

[4]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[5]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[6]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[7]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[8]  Roberto Basili,et al.  Computational Lexicons: the Neat Examples and the Odd Exemplars , 1992, ANLP.

[9]  Ronald Rosenfeld,et al.  Improvements in Stochastic Language Modeling , 1992, HLT.

[10]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[11]  Marti A. Hearst,et al.  A Method for Re ning Automatically-Discovered Lexical Relations: Combining Weak Techniques for Stronger Results , 1992 .

[12]  Sydney Abbey,et al.  What is A “Method”? , 1991 .

[13]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[14]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[15]  Eric Brill,et al.  Deducing Linguistic Structure from the Statistics of Large Corpora , 1990, HLT.

[16]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[17]  Kenneth Ward Church,et al.  Parsing, Word Associations and Typical Predicate-Argument Relations , 1989, HLT.

[18]  Lalit R. Bahl,et al.  A Maximum Likelihood Approach to Continuous Speech Recognition , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[20]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .