Towards the evaluation of time series protection methods

The goal of statistical disclosure control (SDC) is to modify statistical data so that it can be published without releasing confidential information that may be linked to specific respondents. The challenge for SDC is to achieve this variation with minimum loss of the detail and accuracy sought by final users. There are many approaches to evaluate the quality of a protection method. However, all these measures are only applicable to numerical or categorical attributes. In this paper, we present some recent results about time series protection and re-identification. We propose a complete framework to evaluate time series protection methods. We also present some empirical results to show how our framework works.

[1]  Gordon Sande,et al.  Exact and Approximate Methods for Data Directed Microaggregation in One or More Dimensions , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[3]  William E. Winkler,et al.  Re-identification Methods for Masked Microdata , 2004, Privacy in Statistical Databases.

[4]  H. Newcombe Record linking: the design of efficient systems for linking records into individual and family histories. , 1967, American journal of human genetics.

[5]  Coskun Hamzaçebi,et al.  Improving artificial neural networks' performance in seasonal time series forecasting , 2008, Inf. Sci..

[6]  Charu C. Aggarwal,et al.  On k-Anonymity and the Curse of Dimensionality , 2005, VLDB.

[7]  Philip S. Yu,et al.  Top-down specialization for information and privacy preservation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[8]  Ramakrishnan Srikant,et al.  Privacy-preserving data mining , 2000, SIGMOD '00.

[9]  F. Felsö,et al.  Disclosure limitation methods in use: results of a survey , 2001 .

[10]  Vicenç Torra,et al.  Extending Microaggregation Procedures for Time Series Protection , 2006, RSCTC.

[11]  Sheng Zhong,et al.  Two methods for privacy preserving data mining with malicious participants , 2007, Inf. Sci..

[12]  P. Doyle,et al.  Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies , 2001 .

[13]  Josep Domingo-Ferrer,et al.  Post-Masking Optimization of the Tradeoff between Information Loss and Disclosure Risk in Masked Microdata Sets , 2002, Inference Control in Statistical Databases.

[14]  Sheng Zhong,et al.  Privacy-preserving algorithms for distributed mining of frequent itemsets , 2007, Inf. Sci..

[15]  Eamonn J. Keogh,et al.  Iterative Deepening Dynamic Time Warping for Time Series , 2002, SDM.

[16]  T. Warren Liao,et al.  Clustering of time series data - a survey , 2005, Pattern Recognit..

[17]  Vicenç Torra,et al.  Distance Based Re-identification for Time Series, Analysis of Distances , 2006, Privacy in Statistical Databases.

[18]  William E. Winkler Data Cleaning Methods , 2003 .

[19]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[20]  Pascal Heus,et al.  Data Access in a Cyber World: Making Use of Cyberinfrastructure , 2008, Trans. Data Priv..

[21]  Josep Domingo-Ferrer,et al.  Ordinal, Continuous and Heterogeneous k-Anonymity Through Microaggregation , 2005, Data Mining and Knowledge Discovery.

[22]  Josep Domingo-Ferrer,et al.  Practical Data-Oriented Microaggregation for Statistical Disclosure Control , 2002, IEEE Trans. Knowl. Data Eng..

[23]  Samir Khuller,et al.  Achieving anonymity via clustering , 2006, PODS '06.

[24]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[25]  Josep Domingo-Ferrer,et al.  Record linkage methods for multidatabase data mining , 2003 .

[26]  Javier Herranz,et al.  How to Group Attributes in Multivariate Microaggregation , 2008, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[27]  Carlos Gomes da Silva,et al.  Time series forecasting with a non-linear model and the scatter search meta-heuristic , 2008, Inf. Sci..

[29]  Josep Domingo-Ferrer,et al.  Using Mahalanobis Distance-Based Record Linkage for Disclosure Risk Assessment , 2006, Privacy in Statistical Databases.

[30]  Frank Klawonn,et al.  Fuzzy Clustering of Short Time-Series and Unevenly Distributed Sampling Points , 2003, IDA.

[31]  Nabil R. Adam,et al.  Security-control methods for statistical databases: a comparative study , 1989, ACM Comput. Surv..

[32]  William E. Winkler,et al.  Disclosure Risk Assessment in Perturbative Microdata Protection , 2002, Inference Control in Statistical Databases.

[33]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[34]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[35]  Gwilym M. Jenkins,et al.  Time series analysis, forecasting and control , 1971 .

[36]  Nicholas Rescher,et al.  Predicting the future : an introduction to the theory of forecasting , 1998 .

[37]  Josep Domingo-Ferrer,et al.  On the complexity of optimal microaggregation for statistical disclosure control , 2001 .

[38]  Vicenç Torra,et al.  Microaggregation for Categorical Variables: A Median Based Approach , 2004, Privacy in Statistical Databases.

[39]  Mark Elliot,et al.  A Measure of Disclosure Risk for Tables of Counts , 2008, Trans. Data Priv..

[40]  V. Torra,et al.  Disclosure control methods and information loss for microdata , 2001 .

[41]  Richard A. Davis,et al.  Introduction to time series and forecasting , 1998 .

[42]  Vicenc Torra,et al.  Information Fusion in Data Mining , 2003 .

[43]  Josep Domingo-Ferrer,et al.  Microaggregation Heuristics for p-Sensitive k-Anonymity , 2007 .

[44]  Javier Herranz,et al.  Rethinking rank swapping to decrease disclosure risk , 2008, Data Knowl. Eng..

[45]  Josep Domingo-Ferrer,et al.  Probabilistic Information Loss Measures in Confidentiality Protection of Continuous Microdata , 2005, Data Mining and Knowledge Discovery.

[46]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[47]  George T. Duncan,et al.  Disclosure Risk vs. Data Utility: The R-U Confidentiality Map , 2003 .

[48]  Jonathan D. Cryer,et al.  Time Series Analysis , 1986 .

[49]  Elisa Bertino,et al.  A Framework for Evaluating Privacy Preserving Data Mining Algorithms* , 2005, Data Mining and Knowledge Discovery.