论文信息 - A Vector Model for Syntagmatic and Paradigmatic Relatedness

A Vector Model for Syntagmatic and Paradigmatic Relatedness

This paper introduces context digests, high-dimensional real-valued representations for the typical left and right contexts of a word. Initial entries for the context digests are formed from the word’s close left and right neighbors. A singular value decomposition reduces the dimensionality of the space to enable subsequent efficient processing. In contrast to similar techniques, no preprocessor such as a parser is required. Context digests summarize both syntagmatic and paradigmatic relations between words: how typical they are as neighbors and how well they are substitutable for each other. We apply context digests to identifying collocations, to assessing the similarity of the arguments of different verbs, and to clustering occurrences of adjectives and verbs according to the words they modify in context.

Hinrich Schütze | Hinrich Schütze

[1] David R. Karger,et al. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections , 2017, SIGF.

[2] Kenneth Ward Church,et al. Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[3] Naftali Tishby,et al. Distributional Clustering of English Words , 1993, ACL.

[4] Hinrich Schütze,et al. Part-of-Speech Induction From Scratch , 1993, ACL.

[5] Michael W. Berry,et al. Large-Scale Sparse Singular Value Computations , 1992 .

[6] Ido Dagan,et al. Contextual Word Similarity and Estimation from Sparse Data , 1993, ACL.

[7] Hinrich Schütze,et al. Word Space , 1992, NIPS.

[8] T. Landauer,et al. Indexing by Latent Semantic Analysis , 1990 .