论文信息 - The VADA Architecture for Cost-Effective Data Wrangling

The VADA Architecture for Cost-Effective Data Wrangling

Data wrangling, the multi-faceted process by which the data required by an application is identified, extracted, cleaned and integrated, is often cumbersome and labor intensive. In this paper, we present an architecture that supports a complete data wrangling lifecycle, orchestrates components dynamically, builds on automation wherever possible, is informed by whatever data is available, refines automatically produced results in the light of feedback, takes into account the user's priorities, and supports data scientists with diverse skill sets. The architecture is demonstrated in practice for wrangling property sales and open government data.

[1] Mary Roth,et al. Data Wrangling: The Challenging Yourney from the Wild to the Lake , 2015, CIDR.

[2] Wenfei Fan,et al. Foundations of Data Quality Management , 2012, Foundations of Data Quality Management.

[3] Tim Furche,et al. Data Wrangling for Big Data: Challenges and Opportunities , 2016, EDBT.

[4] Andrea Calì,et al. A general Datalog-based framework for tractable query answering over ontologies , 2012, J. Web Semant..

[5] Serge Abiteboul,et al. Relational transducers for electronic commerce , 1998, J. Comput. Syst. Sci..

[6] Michael Stonebraker,et al. The Data Civilizer System , 2017, CIDR.

[7] Tim Furche,et al. The ontological key: automatically understanding and integrating forms to access the deep Web , 2013, The VLDB Journal.

[8] Andrea Calì,et al. A general datalog-based framework for tractable query answering over ontologies , 2009, SEBD.

[9] Alexandra Roatis,et al. CLAMS: Bringing Quality to Data Lakes , 2016, SIGMOD Conference.

[10] Sandra Geisler,et al. Constance: An Intelligent Data Lake System , 2016, SIGMOD Conference.

[11] Panos Vassiliadis,et al. A Survey of Extract-Transform-Load Technology , 2009, Int. J. Data Warehous. Min..

[12] Jeffrey Heer,et al. Wrangler: interactive visual specification of data transformation scripts , 2011, CHI.

[13] Michael Stonebraker,et al. DataXFormer: An Interactive Data Transformation Tool , 2015, SIGMOD Conference.