A Comparison of Rule-Based and Machine Learning Methods for Identifying Non-nominal It

The pronoun it is noted to be used in a variety of nonnominal ways. The identification of non-nominal pronouns is important in information retrieval, machine translation and automatic summarisation. Given that previous work has only tackled a subset of those non-nominal uses, a machine learning method for identification of all instances of non-nominal it is presented. The machine learning method is compared with a rule-based approach. The performance of each implementation is evaluated. The construction of an annotated corpus and training data are also described.