论文信息 - Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem? - 字舞流文

Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem?

We seek a knowledge-free method for inducing multiword units from text corpora for use as machine-readable dictionary headwords. We provide two major evaluations of nine existing collocation-finders and illustrate the continuing need for improvement. We use Latent Semantic Analysis to make modest gains in performance, but we show the significant challenges encountered in trying this approach.

Daniel Jurafsky | Patrick Schone | Dan Jurafsky | Patrick Schone

[1] Hinrich Schütze,et al. Distributed syntactic representations with an application to part-of-speech tagging , 1993, ICNN.

[2] Richard Sproat,et al. A statistical method for finding word boundaries in Chinese text , 1990 .

[3] Slava M. Katz,et al. Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[4] Alexander H. Waibel,et al. Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition , 1997, EUROSPEECH.

[5] Mary Elizabeth Stevens,et al. Statistical Association Methods for Mechanized Documentation. , 1967 .

[6] Mill Johannes G.A. Van,et al. Transmission Of Information , 1961 .

[7] Richard A. Harshman,et al. Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[8] P. Resnik. Selectional constraints: an information-theoretic model and its computational realization , 1996, Cognition.

[9] C. J. van Rijsbergen,et al. Report on the need for and provision of an 'ideal' information retrieval test collection , 1975 .

[10] Sayori Shimohata,et al. Retrieving Collocations by Co-Occurrences and Word Order Constraints , 1997, ACL.

[11] Daniel Jurafsky,et al. Knowledge-Free Induction of Morphology Using Latent Semantic Analysis , 2000, CoNLL/LLL.

[12] T. A. Cartwright,et al. Distributional regularity and phonotactic constraints are useful for segmentation , 1996, Cognition.

[13] Béatrice Daille,et al. Study and Implementation of Combined Techniques for Automatic Extraction of Terminology , 1994 .

[14] Amiel Feinstein,et al. Transmission of Information. , 1962 .

[15] Yingying Wen,et al. A compression based algorithm for Chinese word segmentation , 2000, CL.

[16] Richard Sproat,et al. Morphology and computation , 1992 .

[17] Yaacov Choueka,et al. Looking for Needles in a Haystack or Locating Interesting Collocational Expressions in Large Textual Databases , 1988, RIAO Conference.

[18] Kenneth Ward Church,et al. Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[19] SmadjaFrank. Retrieving collocations from text , 1993 .

[20] George A. Miller,et al. Introduction to WordNet: An On-line Lexical Database , 1990 .

[21] E. Newport,et al. WORD SEGMENTATION : THE ROLE OF DISTRIBUTIONAL CUES , 1996 .

[22] Chilin Shih,et al. A Stochastic Finite-State Word-Segmentation Algorithm for Chinese , 1994, ACL.

[23] Evelyne Tzoukermann,et al. Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax , 1997, ACL.

[24] Donald Hindle,et al. Noun Classification From Predicate-Argument Structures , 1990, ACL.

[25] Vincent E. Giuliano,et al. THE INTERPRETATION OF WORD ASSOCIATIONS. , 1963 .

[26] Ted Dunning,et al. Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[27] Frank Smadja,et al. Retrieving Collocations from Text: Xtract , 1993, CL.

[28] William D. Raymond,et al. The effects of collocational strength and contextual predictability in lexical production 1 , 1999 .

[29] Peter W. Foltz,et al. An introduction to latent semantic analysis , 1998 .

[30] L. R. Dice. Measures of the Amount of Ecologic Association Between Species , 1945 .

[31] J. AnneMiller. The Balancing act , 1976 .

[32] J. Ponte. USe: A Retargetable Word Segmentation Procedure for Information Retrieval , 1996 .