Big data: a research agenda

Recently, a great deal of interest for Big Data has risen, mainly driven from a widespread number of research problems strongly related to real-life applications and systems, such as representing, modeling, processing, querying and mining massive, distributed, large-scale repositories (mostly being of unstructured nature). Inspired by this main trend, in this paper we discuss three important aspects of Big Data research, namely OLAP over Big Data, Big Data Posting, and Privacy of Big Data. We also depict future research directions, hence implicitly defining a research agenda aiming at leading future challenges in this research field.

[1]  I. Song,et al.  Analytics over large-scale multidimensional data: the big data revolution! , 2011, DOLAP '11.

[2]  Joseph M. Hellerstein,et al.  MAD Skills: New Analysis Practices for Big Data , 2009, Proc. VLDB Endow..

[3]  Beng Chin Ooi,et al.  The performance of MapReduce , 2010, Proc. VLDB Endow..

[4]  Ashwin Machanavajjhala,et al.  Big privacy: protecting confidentiality in big data , 2012, XRDS.

[5]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[6]  Jinjun Chen,et al.  An efficient quasi-identifier index based approach for privacy preservation over incremental data sets on cloud , 2013, J. Comput. Syst. Sci..

[7]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[8]  Murat Kantarcioglu,et al.  Policy Enforcement Framework for Cloud Data Management , 2012, IEEE Data Eng. Bull..

[9]  Philip S. Yu,et al.  Graph OLAP: a multi-dimensional framework for graph data analysis , 2009, Knowledge and Information Systems.

[10]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[11]  Rick Cattell,et al.  Scalable SQL and NoSQL data stores , 2011, SGMD.

[12]  Abraham Silberschatz,et al.  HadoopDB: An Architectural Hybrid of MapReduce and DBMS Technologies for Analytical Workloads , 2009, Proc. VLDB Endow..

[13]  Qing Liu,et al.  Efficient Computation of the Skyline Cube , 2005, VLDB.

[14]  Nimrod Megiddo,et al.  Discovery-Driven Exploration of OLAP Data Cubes , 1998, EDBT.

[15]  Alfredo Cuzzocrea,et al.  A Hierarchy-Driven Compression Technique for Advanced OLAP Visualization of Multidimensional Data Cubes , 2006, DaWaK.

[16]  Elisa Bertino,et al.  Privacy-Preserving Fine-Grained Access Control in Public Clouds , 2012, IEEE Data Eng. Bull..

[17]  Sharma Chakravarthy,et al.  Event-based lossy compression for effective and efficient OLAP over data streams , 2010, Data Knowl. Eng..

[18]  David Maier,et al.  On the foundations of the universal relation model , 1984, TODS.

[19]  Jiawei Han,et al.  Graph cube: on warehousing and OLAP multidimensional networks , 2011, SIGMOD '11.

[20]  Yanpei Chen,et al.  Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads , 2012, Proc. VLDB Endow..

[21]  Marcin Zukowski,et al.  Vectorwise: Beyond Column Stores , 2012, IEEE Data Eng. Bull..

[22]  Surajit Chaudhuri,et al.  An overview of data warehousing and OLAP technology , 1997, SGMD.

[23]  Keith W. Miller,et al.  Big Data: New Opportunities and New Challenges [Guest editors' introduction] , 2013, Computer.

[24]  Domenico Saccà,et al.  Data Posting: a New Frontier for Data Exchange in the Big Data Era , 2013, AMW.

[25]  Torben Bach Pedersen,et al.  Specifying OLAP Cubes on XML Data , 2001, Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001.

[26]  Chen Li,et al.  Inside "Big Data management": ogres, onions, or parfaits? , 2012, EDBT '12.

[27]  Alfredo Cuzzocrea,et al.  Semantics-Aware Advanced OLAP Visualization of Multidimensional Data Cubes , 2007, Int. J. Data Warehous. Min..

[28]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[29]  Andrew Rau-Chaplin,et al.  The cgmCUBE project: Optimizing parallel data cube generation for ROLAP , 2006, Distributed and Parallel Databases.

[30]  Ariel J. Feldman,et al.  Privacy and Integrity are Possible in the Untrusted Cloud , 2012, IEEE Data Eng. Bull..

[31]  Alfredo Cuzzocrea Retrieving Accurate Estimates to OLAP Queries over Uncertain and Imprecise Multidimensional Data Streams , 2011, SSDBM.

[32]  Sergio Greco,et al.  A distributed system for answering range queries on sensor network data , 2005, Third IEEE International Conference on Pervasive Computing and Communications Workshops.

[33]  Divyakant Agrawal,et al.  Big data and cloud computing: current state and future opportunities , 2011, EDBT/ICDT '11.

[34]  Kenneth A. Ross,et al.  Ameliorating memory contention of OLAP operators on GPU processors , 2012, DaMoN '12.

[35]  Domenico Saccà,et al.  Count Constraints and the Inverse OLAP Problem: Definition, Complexity and a Step toward Aggregate Data Exchange , 2012, FoIKS.

[36]  Liang Dong,et al.  Starfish: A Self-tuning System for Big Data Analytics , 2011, CIDR.

[37]  Alfredo Cuzzocrea,et al.  Effectively and Efficiently Designing and Querying Parallel Relational Data Warehouses on Heterogeneous Database Clusters: The F&A Approach , 2012, J. Database Manag..