Spreadsheet-based complex data transformation

Spreadsheets are used by millions of users as a routine all-purpose data management tool. It is now increasingly necessary for external applications and services to consume spreadsheet data. In this paper, we investigate the problem of transforming spreadsheet data to structured formats required by these applications and services. Unlike prior methods, we propose a novel approach in which transformation logic is embedded into a familiar and expressive spreadsheet-like formula mapping language. Popular transformation patterns provided by transformation languages and mapping tools, that are relevant to spreadsheet-based data transformation, are supported in the language via formulas. Consequently, the language avoids cluttering the source spreadsheets with transformations and turns out to be helpful when multiple schemas are targeted. We implemented a prototype and evaluated the benefits of our approach via experiments in a real application. The experimental results confirmed the benefits of our approach.

[1]  Wang Chiew Tan,et al.  STBenchmark: towards a benchmark for mapping systems , 2008, Proc. VLDB Endow..

[2]  Mary Czerwinski,et al.  Visualization of mappings between schemas , 2005, CHI.

[3]  Martin Erwig,et al.  Header and Unit Inference for Spreadsheets Through Spatial Analyses , 2004, 2004 IEEE Symposium on Visual Languages - Human Centric Computing.

[4]  Jerzy Tyszkiewicz Spreadsheet as a relational database engine , 2010, SIGMOD Conference.

[5]  Mary Shaw,et al.  Estimating the numbers of end users and end user programmers , 2005, 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC'05).

[6]  Thomas G. Dietterich,et al.  The life and times of files and information: a study of desktop provenance , 2010, CHI.

[7]  Hung Thanh Vu,et al.  Spreadsheet-based complex data transformation , 2011 .

[8]  Paolo Papotti,et al.  Core schema mappings , 2009, SIGMOD Conference.

[9]  Bin Liu,et al.  A Spreadsheet Algebra for a Direct Data Manipulation Query Interface , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[10]  Paolo Papotti,et al.  Clip: a Visual Language for Explicit Schema Mappings , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[11]  Martin Wattenberg,et al.  ManyEyes: a Site for Visualization at Internet Scale , 2007, IEEE Transactions on Visualization and Computer Graphics.

[12]  Laks V. S. Lakshmanan,et al.  On querying spreadsheets , 1998, Proceedings 14th International Conference on Data Engineering.

[13]  Mary Roth,et al.  XML mapping technology: Making connections in an XML-centric world , 2006, IBM Syst. J..

[14]  Paolo Papotti,et al.  Nested mappings: schema mapping reloaded , 2006, VLDB.