Privacy or Security?: Take A Look And Then Decide

Big data paradigm is currently the leading paradigm for data production and management. As a matter of fact, new information are generated at high rates in specialized fields (e.g., cybersecurity scenario). This may cause that the events to be studied occur at rates that are too fast to be effectively analyzed in real time. For example, in order to detect possible security threats, millions of records in a high-speed flow stream must be screened. To ameliorate this problem, a viable solution is the use of data compression for reducing the amount of data to be analyzed. In this paper we propose the use of privacy-preserving histograms, that provide approximate answers to 'safe' queries, for analyzing data in the cybersecurity scenario without compromising individuals' privacy, and we describe our system that has been used in a real life scenario.

[1]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[2]  Yannis E. Ioannidis,et al.  The History of Histograms (abridged) , 2003, VLDB.

[3]  Filippo Furfaro,et al.  Exploiting Cluster Analysis for Constructing Multi-dimensional Histograms on Both Static and Evolving Data , 2006, EDBT.

[4]  Francesco M. Malvestuto,et al.  A universal-scheme approach to statistical databases containing homogeneous summary tables , 1993, TODS.

[5]  Yannis E. Ioannidis,et al.  Selectivity Estimation Without the Attribute Value Independence Assumption , 1997, VLDB.

[6]  Kyuseok Shim,et al.  Approximate query processing using wavelets , 2001, The VLDB Journal.

[7]  Alexandros Labrinidis,et al.  Challenges and Opportunities with Big Data , 2012, Proc. VLDB Endow..

[8]  Yossi Matias,et al.  New sampling-based summary statistics for improving approximate query answers , 1998, SIGMOD '98.

[9]  Torsten Suel,et al.  On Rectangular Partitionings in Two Dimensions: Algorithms, Complexity, and Applications , 1999, ICDT.

[10]  Filippo Furfaro,et al.  A Probabilistic Framework for Building Privacy-Preserving Synopses of Multi-dimensional Data , 2008, SSDBM.

[11]  Elio Masciari,et al.  RFID-data compression for supporting aggregate queries , 2013, TODS.

[12]  Liusheng Huang,et al.  Personalized Privacy-Preserving Data Aggregation for Histogram Estimation , 2014, 2015 IEEE Global Communications Conference (GLOBECOM).

[13]  Helmut Krcmar,et al.  Big Data , 2014, Wirtschaftsinf..