LHS-Based Hybrid Microdata vs Rank Swapping and Microaggregation for Numeric Microdata Protection

In previous work by Domingo-Ferrer et al., rank swapping and multivariate microaggregation has been identified as well-performing masking methods for microdata protection. Recently, Dandekar et al. proposed using synthetic microdata, as an option, in place of original data by using Latin hypercube sampling (LHS) technique. The LHS method focuses on mimicking univariate as well as multivariate statistical characteristics of original data. The LHS-based synthetic data does not allow one to one comparison with original data. This prevents estimating the overall information loss by using current measures. In this paper we utilize unique features of LHS method to create hybrid data sets and evaluate their performance relative to rank swapping and multivariate microaggregation using generalized information loss and disclosure risk measures.