论文信息 - New learning models for robust reference resolution

New learning models for robust reference resolution

An important challenge for the automatic understanding of natural language texts is the correct computation of the discourse entities that are mentioned therein—persons, locations, abstract objects, and so on. The problem of mapping linguistic expressions into these underlying entities is known as reference resolution. Recent years of research in computational reference resolution have seen the emergence of machine learning approaches, which are much more robust and better performing than their rule-based predecessors. Unfortunately, perfect performance are still out of reach for these systems. Broadly defined, the aim of this dissertation is to improve on these existing systems by exploring more advanced machine learning models, which are: (i) able to more adequately encode the structure of the problem, and (ii) allow a better use of the information sources that are given to the system. Starting with the sub-task of anaphora resolution, we propose to model this task as a ranking problem and no longer as a classification problem (as is done in existing systems). A ranker offers a potentially better way to model this task by directly including the comparison between antecedent candidates as part of its training criterion. We find that the ranker delivers significant performance improvements over classification-based systems, and is also computationally more attractive in terms of training time and learning rate than its rivals. The ranking approach is then extended to the larger problem of coreference resolution. To main goal is to see whether the better antecedent selection capabilities offered by the ranking approach can also benefit in the larger coreference resolution task. The extension is two-fold. First, we design various specialized ranker models for different types referential expressions (e.g., pronouns, definite descriptions, proper names). Besides its linguistic appeal, this division of labor has also the potential of learning better model parameters. Second, we augment these rankers with a model that determines the discourse status of mentions and that is used to filter the “non-anaphoric” mentions. As shown by various experiments, this combined strategy results in significant performance improvements over the single-model, classification-based approach on the three main coreference metrics: the standard MUC metric, but also the more representative B3 and CEAF metrics. Finally, we show how the task of coreference resolution can be recast as a linear optimization problem. In particular, we use the framework of Integer Linear Programming (ILP) to: (i) combine the predictions of three local models (namely, a standard pairwise coreference classifier, a discourse status classifier, and a named entity classifier) in a joint, global inference, and (ii) integrate various other global constraints (such as transitivity constraints) to better capture the dependencies between coreference decisions. Tested on the ACE datasets, our ILP formulations deliver significant f-score improvements over both a standard pairwise model, and various models that employ the discourse status and a named entity classifiers in a cascade. These improvements were again found to hold across the three different evalution metrics: MUC, B3, and CEAF. The fact that B3 and CEAF scores were also improved is of particular importance, since these two metrics are much less lenient than MUC in terms of precision errors.