HOBBIT: Holistic Benchmarking of Big Linked Data

Project Overview. Linked Data has gained significant momentum over the last years and is now used at industrial scale in many sectors including bio-medicine, plant manufacturing, smart cities, content creation and management, data journalism, asset management and storage, banking and many more. In these sectors, an increasingly large amount (volume) of rapidly changing (velocity) data in different formats (i.e., unstructured to structured data, variety) needs to be integrated (vocabulary), refined, curated (veracity) or even bundled (value) to serve as data basis for large-scale applications. However, deciding on the right framework to build a given pipeline is still a demanding endeavour. Works such as GERBIL [3] and BAT [2] point to the comparability of Linked Data solutions in the area of Named Entity Recognition and Linking being undermined by (1) frameworks being evaluated on different datasets (2) based on different implementations of homonymous measures as well as on (3) non-standardized hardware. This makes the comparability of the results presented across different publications rather difficult to establish. This reality however holds across the whole of the Linked Data lifecycle. HOBBIT aims to address these drawbacks by providing a family of industry-relevant benchmarks for the Big Linked Data value chain through a generic evaluation platform deployed on standardized hardware. In more detail, we aim to: