Classifying Taxonomic Relations between Pairs of Wikipedia Articles

Natural language generation systems rely on taxonomic thesauri for tasks such as lexical choice and aggregation. WordNet is one such taxonomy, but it is limited in size. Motivated by the needs of a generation system in the scientific literature domain, we present a method for building a taxonomic thesaurus from Wikipedia articles, where each article represents a potential concept in the taxonomy. We propose framing the problem of creating a taxonomy as a classification task of the potential relations between individual Wikipedia article pairs, and show that a supervised algorithm can achieve high precision in this task with very little training data.