DISEMINER : a distributional-semantics inference maker

Abstract : The purpose of the DISEMINER system is to explore the relation between lexical distribution criteria and semantics. It is hoped that the system, in its learning mode, will be useful in collecting data for deriving semotactic rules in a stratificational grammar. The system, written in ALGOL 20 and operational on the G-21 computer, is capable of learning distribution classes of lexical items through the processing of text, and using distributional criteria to answer questions that are broader than the context of the text processed. The methodology follows a line of research that was considered, but never followed, in early work on the SYNTHEX project. Distributional information is stored in terms of a dependency structure that differs from the SYNTHEX version in that dependency relations among stem types, rather than stem tokens, are stored in matrix format. That is, each stem is listed only once, and its dependency relations in all text processed by the system are associated with a single entry. (In the SYNTHEX system, separate dependencies are tabulated for each occurrence of a stem.) The stored relations include all possible transitive paths as well as direct ones. Because dependency analysis is weakly equivalent to phrase structure analysis, it is possible to view this data structure as a tabulation of the distributional potential of stems with respect to phrase structure criteria rather than criteria of linear contiguity. (Author)