Towards Entity Summarisation on Structured Web Markup

Embedded markup based on Microdata, RDFa, and Microformats have become prevalent on the Web and constitute an unprecedented source of data. However, statements extracted from markup are fundamentally different to traditional RDF graphs: entity descriptions are flat, facts are highly redundant and granular, and co-references are very frequent yet explicit links are missing. Therefore, carrying out typical entity-centric tasks such as retrieval and summarisation cannot be tackled sufficiently with state of the art methods. We present an entity summarisation approach that overcomes such issues through a combination of entity retrieval and summarisation techniques geared towards the specific challenges associated with embedded markup. We perform a preliminary evaluation on a subset of the Web Data Commons dataset and show improvements over existing entity retrieval baselines. In addition, an investigation into the coverage and complementary of facts from the constructed entity summaries shows potential for aiding tasks such as knowledge base population.