Symmetric and Asymmetric Aggregate Function in Massively Parallel Computing

Applications of aggregation for information summary have great meanings in various fields. In big data era, processing aggregate function in parallel is drawing researchers' attention. The aim of our work is to propose a generic framework enabling to map an arbitrary aggregation into a generic algorithm and identify when it can be efficiently executed on modern large-scale data-processing systems. We describe our preliminary results regarding classes of symmetric and asymmetric aggregation that can be mapped, in a systematic way, into efficient MapReduce-style algorithms.

[1]  Jean-Luc Marichal,et al.  Strongly barycentrically associative and preassociative functions , 2016 .

[2]  Sergei Vassilvitskii,et al.  A model of computation for MapReduce , 2010, SODA '10.

[3]  Jon Feldman,et al.  On distributing symmetric streaming computations , 2008, SODA '08.

[4]  Jiaxing Zhang,et al.  Automating Distributed Partial Aggregation , 2014, SoCC.

[5]  Alfredo Cuzzocrea Aggregation and multidimensional analysis of big data for large-scale scientific applications: models, issues, analytics, and beyond , 2015, SSDBM.

[6]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[7]  Jean-Luc Marichal,et al.  Preassociative aggregation functions , 2014, Fuzzy Sets Syst..

[8]  Werner Nutt,et al.  Rewriting queries with arbitrary aggregation functions using views , 2006, TODS.

[9]  Michael Isard,et al.  Distributed aggregation for data-parallel computing: interfaces and implementations , 2009, SOSP '09.

[10]  Todd Mytkowicz,et al.  Parallelizing user-defined aggregations using symbolic execution , 2015, SOSP.

[11]  Sara Cohen,et al.  User-defined aggregate functions: bridging theory and practice , 2006, SIGMOD Conference.

[12]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[13]  Wei Hong,et al.  Proceedings of the 5th Symposium on Operating Systems Design and Implementation Tag: a Tiny Aggregation Service for Ad-hoc Sensor Networks , 2022 .