$X-BAND$ : Expiration Band for Anonymizing Varied Data Streams

The Internet of Things (IoT) has formed a whole new layer of the world built on the Internet, reaching every connected device, actuator, and sensor. Many organizations utilize IoT data streams for research and development purposes. To make value out of these data streams, the data handling party must ensure the privacy of the individuals. The most common approach to provide privacy preservation is anonymization. IoT data provide varied data streams due to the nature of the individual’s preference and versatile devices pool. The conventional single-tuple expiration-driven sliding window method is not adequate to provide efficient anonymization. Furthermore, the minimization of missingness has to be considered for the varied data stream anonymization. Therefore, we propose the X-BAND algorithm that utilizes the new expiration-band mechanism for handling varied data streams to achieve efficient anonymization, and we introduce weighted distance function for X-BAND to reduce missingness of published data. Our experiment on real data sets shows that X-BAND is effective and efficient compared to the famous conventional anonymization algorithm FADS. X-BAND demonstrated 5%–11% and 1%–3% less information loss on real data sets Adult and PM2.5, respectively, while performing similar on clustering, comparable to reusing suppression and runtime. Also, the new weighted distance function is effective for reducing missingness for anonymization.

[1]  Quan Pan,et al.  Adaptive imputation of missing values for incomplete pattern classification , 2016, Pattern Recognit..

[2]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[3]  Marimuthu Palaniswami,et al.  Internet of Things (IoT): A vision, architectural elements, and future directions , 2012, Future Gener. Comput. Syst..

[4]  Sylvia L. Osborn,et al.  Delay-sensitive approaches for anonymizing numerical streaming data , 2013, International Journal of Information Security.

[5]  Beng Chin Ooi,et al.  Anonymizing Streaming Data for Privacy Protection , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[6]  Zhongheng Zhang,et al.  Missing data imputation: focusing on single imputation. , 2016, Annals of translational medicine.

[7]  Esther-Lydia Silva-Ramírez,et al.  Single imputation with multilayer perceptron and multiple imputation combining multilayer perceptron and k-nearest neighbours for monotone patterns , 2015, Appl. Soft Comput..

[8]  Sylvia L. Osborn,et al.  FAANST: Fast Anonymizing Algorithm for Numerical Streaming DaTa , 2010, DPM/SETOP.

[9]  G. Priya,et al.  EFFICIENT KNN CLASSIFICATION ALGORITHM FOR BIG DATA , 2017 .

[10]  Mohammed Al-Zobbi,et al.  Experimenting sensitivity-based anonymization framework in apache spark , 2018, Journal of Big Data.

[11]  Huawen Liu,et al.  MAGE: A semantics retaining K-anonymization method for mixed data , 2014, Knowl. Based Syst..

[12]  John W. Graham Multiple Imputation with Norm 2.03 , 2012 .

[13]  Rina Dechter,et al.  Generalized best-first search strategies and the optimality of A* , 1985, JACM.

[14]  Vijay S. Iyengar,et al.  Transforming data to satisfy privacy constraints , 2002, KDD.

[15]  Jiming Chen,et al.  Smart community: an internet of things application , 2011, IEEE Communications Magazine.

[16]  Sebastian Link,et al.  Transactions on Large-Scale Data- and Knowledge-Centered Systems XXIV , 2016, Lecture Notes in Computer Science.

[17]  ASHWIN MACHANAVAJJHALA,et al.  L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[18]  K. Wagstaff Clustering with Missing Values: No Imputation Required , 2004 .

[19]  David J. DeWitt,et al.  Incognito: efficient full-domain K-anonymity , 2005, SIGMOD '05.

[20]  Johann Eder,et al.  Anonymization of Data Sets with NULL Values , 2016, Trans. Large Scale Data Knowl. Centered Syst..

[21]  Peng Li,et al.  Multiple Imputation: A Flexible Tool for Handling Missing Data. , 2015, JAMA.

[22]  Rajkumar Buyya,et al.  Ensuring Security and Privacy Preservation for Cloud Data Services , 2016, ACM Comput. Surv..

[23]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[24]  Sung-Kwan Joo,et al.  Smart heating and air conditioning scheduling method incorporating customer convenience for home energy management system , 2013, IEEE Transactions on Consumer Electronics.

[25]  Sven Kosub,et al.  A note on the triangle inequality for the Jaccard distance , 2016, Pattern Recognit. Lett..

[26]  Lei Zhao,et al.  B-CASTLE: An Efficient Publishing Algorithm for K-Anonymizing Data Streams , 2010, 2010 Second WRI Global Congress on Intelligent Systems.

[27]  Stef van Buuren,et al.  Flexible Imputation of Missing Data , 2012 .

[29]  Kian-Lee Tan,et al.  CASTLE: Continuously Anonymizing Data Streams , 2011, IEEE Transactions on Dependable and Secure Computing.

[30]  Xiao-Li Meng,et al.  Applications of multiple imputation in medical studies: from AIDS to NHANES , 1999, Statistical methods in medical research.

[31]  Jinjun Chen,et al.  An efficient quasi-identifier index based approach for privacy preservation over incremental data sets on cloud , 2013, J. Comput. Syst. Sci..

[32]  Benjamin C. M. Fung,et al.  Privacy-preserving trajectory stream publishing , 2014, Data Knowl. Eng..

[33]  Yon Dohn Chung,et al.  A framework to preserve the privacy of electronic health data streams , 2014, J. Biomed. Informatics.

[34]  Shunqin Li Poisson process with fuzzy rates , 2010, Fuzzy Optim. Decis. Mak..

[35]  Bin Jiang,et al.  Continuous privacy preserving publishing of data streams , 2009, EDBT '09.

[36]  Pierangela Samarati,et al.  Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression , 1998 .

[37]  Qishan Zhang,et al.  Fast clustering-based anonymization approaches with time constraints for data streams , 2013, Knowl. Based Syst..

[38]  Jinyan Wang,et al.  Two Privacy-Preserving Approaches for Publishing Transactional Data Streams , 2018, IEEE Access.

[39]  Salvatore Cuomo,et al.  Comparison of Estimating Missing Values in IoT Time Series Data Using Different Interpolation Algorithms , 2018, International Journal of Parallel Programming.

[40]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[41]  Felix Wortmann,et al.  Internet of Things , 2015, Business & Information Systems Engineering.

[42]  Prabu Krishnan,et al.  Design of Collision Detection System for Smart Car Using Li-Fi and Ultrasonic Sensor , 2018, IEEE Transactions on Vehicular Technology.

[43]  Keshav P. Dahal,et al.  Toward Anonymizing IoT Data Streams via Partitioning , 2016, 2016 IEEE 13th International Conference on Mobile Ad Hoc and Sensor Systems (MASS).

[44]  Minho Jo,et al.  Probabilistic Recovery of Incomplete Sensed Data in IoT , 2018, IEEE Internet of Things Journal.

[45]  Jianzhong Li,et al.  Privacy protection on sliding window of data streams , 2007, 2007 International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom 2007).

[46]  Josep Domingo-Ferrer,et al.  Local synthesis for disclosure limitation that satisfies probabilistic k-anonymity criterion , 2017, Trans. Data Priv..

[47]  Chih-Fong Tsai,et al.  A class center based approach for missing value imputation , 2018, Knowl. Based Syst..

[48]  Aderonke Busayo Sakpere,et al.  On Anonymizing Streaming Crime Data: A Solution Approach for Resource Constrained Environments , 2017, Privacy and Identity Management.

[49]  Keshav P. Dahal,et al.  K-VARP: K-anonymity for varied data streams via partitioning , 2018, Inf. Sci..

[50]  Liang Hu,et al.  Missing Value Imputation Based on Gaussian Mixture Model for the Internet of Things , 2015 .

[51]  Kamal Jambi,et al.  Smart Car Parking System Solution for the Internet of Things in Smart Cities , 2018, 2018 1st International Conference on Computer Applications & Information Security (ICCAIS).