Learning and Inference for Hierarchically Split PCFGs

Treebank parsing can be seen as the search for an optimally refined grammar consistent with a coarse training treebank. We describe a method in which a minimal grammar is hierarchically refined using EM to give accurate, compact grammars. The resulting grammars are extremely compact compared to other high-performance parsers, yet the parser gives the best published accuracies on several languages, as well as the best generative parsing numbers in English. In addition, we give an associated coarse-to-fine inference scheme which vastly improves inference time with no loss in test set accuracy.