Enhancing First-Pass Attachment Prediction

This paper explores the convergence between cognitive modeling and engineering solutions to the parsing problem in NLP. Natural language presents many sources of ambiguity, and several theories of human parsing claim that ambiguity is resolved by using past (linguistic) experience. In this paper we analyze and refine a connectionist paradigm (Recursive Neural Networks) capable of processing acyclic graphs to perform supervised learning on syntactic trees extracted from a large corpus of parsed sentences. Following a widely accepted hypothesis in psycholinguistics, we assume an incremental parsing process (one word at a time) that keeps a connected partial parse tree at all times. By implementing a parsing simulation procedure, we collect a large amount of data that shows the viability of the RNN as informant of a disambiguation process. We analyze what kind of information is exploited by the connectionist system in order to resolve different sources of ambiguity, and we see how the generalization performance of the system is affected by the tree complexity and the frequency of specific subtrees. We finally propose some enhancements to the architecture in order to achieve a better prediction accuracy.