LEARNING SUBCATEGORIZATION FRAMES FROM CORPORA : A CASE STUDY FOR MODERN GREEK

Certain Natural Language Processing (NLP) applications such as parsing and semantic processing require complete lexicons that provide subcategorization information for a word of interest, i.e. the necessary information about the set(s) of syntactic constituents the word must combine with, in order for its meaning to be fully expressed. Modern Greek presents high flexibility in the allowable orderings of its syntactic phrases as well as rich variety of syntactic constructions, which may function as arguments to verbs. In this paper, we describe a set of machine learning techniques used to automatically extract subcategorization frames of verbs from corpora.