Towards Semantic Dataset Profiling

The web of data is growing constantly, both in terms of size and impact. A potential data publisher needs to dispose with recapitulative information on the datasets available on the web, so that she can easily identify where to look for the resources to which her data relates. This information will help discover candidate datasets for interlinking. In that context, we investigate the problem of dataset profiling. We define a dataset profile as a set of characteristics, both semantic and statistical, that allow to describe in the best possible way a dataset by taking into account the multiplicity of domains and vocabularies on the web of data.