Results of the OAEI 2007 Library Thesaurus Mapping Track

The National Library of the Netherlands (KB) maintains two large collections of books: the Deposit Collection, containing all the Dutch printed publications (one million items), and the Scientific Collection, with about 1.4 million books mainly about the history, language and culture of the Netherlands. – indexed – using its own controlled vocabulary. The Scientific Collection is described using the GTT thesaurus, a huge vocabulary containing 35,194 general concepts, ranging from Wolkenkrabbers (Sky-scrapers) to Verzorging (Care). The books in the Deposit Collection are mainly indexed against the Brinkman thesaurus, which contains a large set of headings (5,221) for describing the overall subjects of books. Both thesauri have similar coverage (2,895 concepts actually have exactly the same label) but differ in granularity. For each concept in the two thesauri, the usual detailed lexical information is provided: preferred labels (each concept has exactly one of them), synonyms (961 for Brinkman, 14,607 for GTT), extra hidden labels (134 for Brinkman, a couple of thousands for GTT) or scope notes (6,236 for GTT, 192 for Brinkman). The language of both thesauri is Dutch, which makes this track ideal for testing alignment in a non-English situation. The two thesauri also provide structural information for their concepts, in the form of broader and related links. However, GTT contains only 15,746 hierarchical broader links between 35,194 concepts and 6,980 associative related links. Within the Brinkman thesaurus, there are 4,572 hierarchical links and 1,855 associative ones. On average, one can expect at most one parent per concept, for an average depth of 1 and 2, respectively. The structural information found in the case is very poor. For the purpose of the OAEI campaign, the two thesauri were made available in SKOS format. OWL versions were also provided, according to the – lossy – conversion rules detailed on the track page.