Parsing Internal Noun Phrase Structure with Collins' Models

Collins’ widely-used parsing models treat noun phrases (NPs) in a different manner to other constituents. We investigate these differences, using the recently released internal NP bracketing data (Vadas and Curran, 2007a). Altering the structure of the Treebank, as this data does, has a number of consequences, as parsers built using Collins’ models assume that their training and test data will have structure similar to the Penn Treebank’s. Our results demonstrate that it is difficult for Collins’ models to adapt to this new NP structure, and that parsers using these models make mistakes as a result. This emphasises how important treebank structure itself is, and the large amount of influence it can have.