MapReduce Implementations for Privacy Preserving Record Linkage

Over the last decade, the vast explosion of Internet data has fueled the development of Big Data management systems and technologies. The huge amount of data in combination with the need for records linkage under privacy perspective, has led us to current study. To this direction, we describe Privacy Preserving Record Linkage problem based on Bloom Filter encoding techniques which both maintain users' security and permit similarity control. Moreover, we extended our study to the HLSH/FPS private indexing technique and briefly describe four implementations in the MapReduce distributed environment that is capable of processing large scale data. We also conducted experimental evaluation of these four versions in order to evaluate them in terms of job execution time, memory and disk usage.