LDA-Frames: An Unsupervised Approach to Generating Semantic Frames

In this paper we introduce a novel approach to identifying semantic frames from semantically unlabelled text corpora. There are many frame formalisms but most of them suffer from the problem that all frames must be created manually and the set of semantic roles must be predefined. The LDA-Frames approach, based on the Latent Dirichlet Allocation, avoids both these problems by employing statistics on a syntactically tagged corpus. The only information that must be given is a number of semantic frames and a number of semantic roles to be identified. The power of LDA-Frames is first shown on a small sample corpus and then on the British National Corpus.