Spreadsheets for stream processing with unbounded windows and partitions

Stream processing is a computational paradigm that allows the analysis of live data streams as they are produced. This paper describes a programming model, based on enhancements to spreadsheets, that enables users with limited programming experience to participate directly in the development of complex streaming applications. The programming model augments a conventional spreadsheet with streaming features that permit operating over unbounded data sets despite the finite interface provided by the spreadsheet. The new constructs include time-based windows and partitioning. We introduce a spreadsheet compiler that generates C++ code to achieve integration with existing stream processing systems. Our experimental study illustrates the expressivity of the new features and finds that our implementation is between 8x slower and 2x faster than hand-written stream programs.

[1]  Stephen A. Edwards,et al.  The synchronous languages 12 years later , 2003, Proc. IEEE.

[2]  Emery D. Berger,et al.  CheckCell: data debugging for spreadsheets , 2014, OOPSLA.

[3]  Daniel Mills,et al.  MillWheel: Fault-Tolerant Stream Processing at Internet Scale , 2013, Proc. VLDB Endow..

[4]  Jorma Sajaniemi,et al.  An empirical analysis of spreadsheet calculation , 1988, Softw. Pract. Exp..

[5]  Kun-Lung Wu,et al.  IBM Streams Processing Language: Analyzing Big Data in motion , 2013, IBM J. Res. Dev..

[6]  Laura M. Haas,et al.  SECRET: A Model for Analysis of the Execution Semantics of Stream Processing Systems , 2010, Proc. VLDB Endow..

[7]  Eser Kandogan,et al.  A1: end-user programming for web-based system administration , 2005, UIST '05.

[8]  Eric Bouillet,et al.  A tag-based approach for the design and composition of information processing applications , 2008, OOPSLA '08.

[9]  Kun-Lung Wu,et al.  General Incremental Sliding-Window Aggregation , 2015, Proc. VLDB Endow..

[10]  Brad A. Myers,et al.  A Spreadsheet Model for Handling Streaming Data , 2015, CHI.

[11]  Siddharth Seth,et al.  A spreadsheet approach to programming and managing sensor networks , 2006, 2006 5th International Conference on Information Processing in Sensor Networks.

[12]  Matthew Arnold,et al.  META: Middleware for Events, Transactions, and Analytics , 2016, IBM J. Res. Dev..

[13]  Jennifer Widom,et al.  The CQL continuous query language: semantic foundations and query execution , 2006, The VLDB Journal.

[14]  Jignesh M. Patel,et al.  Storm@twitter , 2014, SIGMOD Conference.

[15]  Johan Malmström,et al.  Haxcel A spreadsheet interface to Haskell written in Java. , 2004 .

[16]  Peter Sestoft Implementing function spreadsheets , 2008, WEUSE '08.

[17]  Olivier Tardieu,et al.  Stream Processing with a Spreadsheet , 2014, ECOOP.

[18]  Jerzy Tyszkiewicz,et al.  User Defined Spreadsheet Functions in Excel , 2012, ArXiv.

[19]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[20]  Nicolas Halbwachs,et al.  LUSTRE: a declarative language for real-time programming , 1987, POPL '87.

[21]  Jácome Cunha,et al.  MDSheet: A framework for model-driven spreadsheet engineering , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[22]  David Maier,et al.  Exploiting Punctuation Semantics in Continuous Data Streams , 2003, IEEE Trans. Knowl. Data Eng..

[23]  Jennifer Widom,et al.  Flexible time management in data stream systems , 2004, PODS.

[24]  Kun-Lung Wu,et al.  Auto-parallelizing stateful distributed streaming applications , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[25]  Robert Grimm,et al.  A catalog of stream processing optimizations , 2014, ACM Comput. Surv..

[26]  Anne-Marie Kermarrec,et al.  The many faces of publish/subscribe , 2003, CSUR.

[27]  International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014 , 2014, SIGMOD Conference.

[28]  Bugra Gedik,et al.  Generic windowing support for extensible stream processing systems , 2014, Softw. Pract. Exp..

[29]  Scott Shenker,et al.  Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.

[30]  Olivier Tardieu,et al.  Spreadsheets for Stream Partitions and Windows , 2015, SEMS@ICSE.

[31]  Emerson R. Murphy-Hill,et al.  Enron's Spreadsheets and Related Emails: A Dataset and Analysis , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[32]  Michael I. Gordon,et al.  Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[33]  Ying Li,et al.  Microsoft CEP Server and Online Behavioral Targeting , 2009, Proc. VLDB Endow..

[34]  Sylvain Dehors,et al.  Controlled english language for production and event processing rules , 2011, DEBS '11.

[35]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[36]  Kurt W. Piersol Object Oriented Spreadsheets: The Analytic Spreadsheet Package , 1986, OOPSLA.

[37]  Sumit Gulwani,et al.  NLyze: interactive programming by natural language for spreadsheet data analysis and manipulation , 2014, SIGMOD Conference.

[38]  Peng Li,et al.  Deadlock avoidance for streaming computations with filtering , 2010, SPAA '10.

[39]  Gregg Rothermel,et al.  Testing properties of dataflow program operators , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[40]  Brad A. Myers,et al.  Creating interactive web data applications with spreadsheets , 2014, UIST.