Disambiguation of English PP Attachment using Multilingual Aligned Data

Prepositional phrase attachment (PP attachment) is a major source of ambiguity in English. It poses a substantial challenge to Machine Translation (MT) between English and languages that are not characterized by PP attachment ambiguity. In this paper we present an unsupervised, bilingual, corpus-based approach to the resolution of English PP attachment ambiguity. As data we use aligned linguistic representations of the English and Japanese sentences from a large parallel corpus of technical texts. The premise of our approach is that with large aligned, parsed, bilingual (or multilingual) corpora, languages can learn non-trivial linguistic information from one another with high accuracy. We contend that our approach can be extended to linguistic phenomena other than PP attachment.