On Enriching User-Centered Data Integration Schemas in Service Lakes

In the Big Data era, companies are moving away from traditional data-warehouse solutions whereby expensive and time-consuming ETL (Extract-Transform-Load) processes are used, towards data lakes, which can be viewed as storage repositories holding a vast amount of raw data. In this paper, we position ourselves in the recurrent context where a user has a local dataset that is not sufficient for processing the queries that are of interest to him. In this context, we show how the data lake, or more specifically the service lake since we are focusing on data providing services, can be leveraged to enrich the local dataset with concepts that cater for the processing of user queries. Furthermore, we present the algorithms we have developed for this purpose and showcase the working of our solution using a study case.