A Demo of the Data Civilizer System

Finding relevant data for a specific task from the numerous data sources available in any organization is a daunting task. This is not only because of the number of possible data sources where the data of interest resides, but also due to the data being scattered all over the enterprise and being typically dirty and inconsistent. In practice, data scientists are routinely reporting that the majority (more than 80%) of their effort is spent finding, cleaning, integrating, and accessing data of interest to a task at hand. We propose to demonstrate DATA CIVILIZER to ease the pain faced in analyzing data "in the wild". DATA CIVILIZER is an end-to-end big data management system with components for data discovery, data integration and stitching, data cleaning, and querying data from a large variety of storage engines, running in large enterprises.

[1]  Erhard Rahm,et al.  Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..

[2]  Michael Stonebraker,et al.  The BigDAWG Polystore System , 2015, SGMD.

[3]  Paolo Papotti,et al.  Rheem: Enabling Multi-Platform Task Execution , 2016, SIGMOD Conference.

[4]  Michael Stonebraker,et al.  The Data Civilizer System , 2017, CIDR.

[5]  Alon Y. Halevy,et al.  Data Integration for the Relational Web , 2009, Proc. VLDB Endow..