Automatic identification of relevant chemical compounds from patents. The training corpus.