Anonymisation of Social Networks and Rough Set Approach

Scientific study of network data can reveal many important behaviors of the elements involved and social trends. It also provides insight for suitable changes in the social structure and roles of individuals in it. There are many evidences (HIPAA (2002) Health insurance portability and accountability act. Available online http://www.hhs.gov/ocr/hipaa; Lambert, J Off Stat 9:313–331, 1993; Xu (2006) Utility based anonymisation using local recording. In: KDD’06, Philadelphia) which indicate the precious value of social network data in shedding light on social behavior, health, and well-being of the general public. For this purpose, the social network information needs to be published publicly or before a specialized group. But, depending upon the privacy model considered, this information may involve some sensitive data of individual participants in the social network, which are undesirable to be disclosed. Due to this problem, social network data need to be anonymized before its publication in order to prevent potential reidentification attacks. Data anonymization techniques are abundantly used in relational databases (Aggarwal et al. J Priv Technol, 2005; Backstrom et al. (2007) Wherefore art thou R3579X? Anonymized social networks, hidden patterns, and structural steganography. In: International world wide web conference (WWW). ACM, New York, pp 181–190; Bayardo and Agrawal (2005) Data privacy through optimal k-anonymisation. In: IEEE 21st international conference on data engineering, April 2005; Bamba et al. (2008) Supporting anonymous location queries in mobile environments with privacy grid. In: ACM world wide web conference; Byun et al. (2007) Efficient k-anonymisation using clustering techniques. In: International conference on database systems for advanced applications (DASFAA), pp 188–200; Campan and Truta (2008) A clustering approach for data and structural anonymity in social networks. In: ACM SIGKDD workshop on privacy, security, and trust in KDD (PinKDD), Las Vegas; Chakrabarti et al. (2004) R-MAT: a recursive model for graph mining. In: SIAM international conference on data mining; Chawla et al. (2005) Toward privacy in public databases. In: Proceedings of the theory of cryptography conference, Cambridge, MA; Evfimievski et al. (2003) Limiting privacy breaches in privacy preserving data mining. In: ACM principles of database systems (PODS). ACM, New York, pp 211–222; Getoor and Diehl, A surv SIGKDD Explore Newsl 7(2):3–12, 2005; Ghinita et al. (2007) Fast data anonymisation with low information loss. In: Very large data base conference (VLDB), Vienna, pp 758–769; Lefebvre et al. (2006) Mondrian multidimensional K-anonymity. In: IEEE international conference of data engineering (ICDE), p 25; Liu and Terzi (2008) Towards identity anonymisation on graphs. In: Wang (ed.) SIGMOD conference. ACM, New York, pp 93–106; Lunacek et al. (2006) A crossover operator for the k-anonymity problem. In: Genetic and evolutionary computation conference (GECCO), Seattle, Washington, pp 1713–1720; Machanavajjhala et al. (2006) L-diversity: privacy beyond K-anonymity. In: IEEE international conference on data engineering (ICDE), Atlanta, p 24; Malin, J Am Med Inform Assoc 12(1):28–34, 2004; Nergiz and Clifton (2006) Thoughts on k-anonymisation. In: IEEE 22nd international conference on data engineering workshops (ICDEW), Atlanta, April 2006, p 96; Nergiz and Clifton (2007) Multirelational k-anonymity. In: IEEE 23rd international conference on data engineering posters, April 2007). However, most of the known anonymisation approaches such as suppression or generalization do not directly apply to social network data. One major challenge in social network anonymization is the complexity. In (Gross and Yellen (2006) Graph theory and its applications. CRC, Boca Raton), it has been proved that a particular k-anonymity problem trying to minimize the structural change to the original social network is NP-hard. Research in anonymization of social networks is a relatively new field. In this chapter, we provide a systematic study of different approaches and studies done so far in this direction. There is no doubt that social network nodes can have imprecise data as their attributes. So, normal methods proposed for anonymization are not suitable for such type of social networks. Recently, a very efficient rough set-based algorithm was established in (Tripathy and Prakash Kumar, Int J Rapid Manuf 1(2):189–207, 2009) to handle clustering of tuples in relational models. We shall describe how this algorithm can be used for anonymization of social networks. Also, we shall present some recent algorithms which use isomorphism of graphs for anonymization of social networks. In the end, we shall discuss the current status of research on anonymization of social networks and present some related problems for further study.

[1]  Latanya Sweeney,et al.  k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[2]  Cynthia Dwork,et al.  Wherefore art thou r3579x?: anonymized social networks, hidden patterns, and structural steganography , 2007, WWW '07.

[3]  Traian Marius Truta,et al.  Protection : p-Sensitive k-Anonymity Property , 2006 .

[4]  Ellis Horowitz,et al.  Fundamentals of Computer Algorithms , 1978 .

[5]  Alina Campan,et al.  A Clustering Approach for Data and Structural Anonymity in Social Networks , 2008 .

[6]  Jian Pei,et al.  Preserving Privacy in Social Networks Against Neighborhood Attacks , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[7]  Jennifer Blackhurst,et al.  MMR: An algorithm for clustering categorical data using Rough Set Theory , 2007, Data Knowl. Eng..

[8]  B. K. Tripathy,et al.  MMeR: an algorithm for clustering heterogeneous data using rough set theory , 2009 .

[9]  Ninghui Li,et al.  t-Closeness: Privacy Beyond k-Anonymity and l-Diversity , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10]  Elisa Bertino,et al.  Efficient k -Anonymization Using Clustering Techniques , 2007, DASFAA.

[11]  Elisa Bertino,et al.  EFFICIENT K-ANONYMITY USING CLUSTERING TECHNIQUE , 2006 .

[12]  Rajeev Motwani,et al.  Approximation Algorithms for k-Anonymity , 2005 .

[13]  Roberto J. Bayardo,et al.  Data privacy through optimal k-anonymization , 2005, 21st International Conference on Data Engineering (ICDE'05).

[14]  B. K. Tripathy,et al.  A New Approach to Manage Security against Neighborhood Attacks in Social Networks , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[15]  Chris Clifton,et al.  Hiding the presence of individuals from shared databases , 2007, SIGMOD '07.

[16]  Ling Liu,et al.  Butterfly: Protecting Output Privacy in Stream Mining , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[17]  Ling Liu,et al.  Supporting anonymous location queries in mobile environments with privacygrid , 2008, WWW.

[18]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[19]  Danfeng Yao,et al.  The union-split algorithm and cluster-based anonymization of social networks , 2009, ASIACCS '09.

[20]  Milos Hauskrecht,et al.  Noisy-OR Component Analysis and its Application to Link Analysis , 2006, J. Mach. Learn. Res..

[21]  Ashwin Machanavajjhala,et al.  l-Diversity: Privacy Beyond k-Anonymity , 2006, ICDE.

[22]  Pierangela Samarati,et al.  Protecting Respondents' Identities in Microdata Release , 2001, IEEE Trans. Knowl. Data Eng..

[23]  B.K. Tripathy,et al.  A fast p-sensitive l-diversity Anonymisation algorithm , 2011, 2011 IEEE Recent Advances in Intelligent Computational Systems.

[24]  Indrakshi Ray,et al.  A crossover operator for the k- anonymity problem , 2006, GECCO '06.

[25]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[26]  A. Meyer The Health Insurance Portability and Accountability Act. , 1997, Tennessee medicine : journal of the Tennessee Medical Association.

[27]  Jun-Lin Lin,et al.  An efficient clustering method for k-anonymization , 2008, PAIS '08.

[28]  Raymond Chi-Wing Wong,et al.  (α, k)-anonymity: an enhanced k-anonymity model for privacy preserving data publishing , 2006, KDD '06.

[29]  Pierangela Samarati,et al.  Generalizing Data to Provide Anonymity when Disclosing Information , 1998, PODS 1998.

[30]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[31]  Lise Getoor,et al.  Preserving the Privacy of Sensitive Relationships in Graph Data , 2007, PinKDD.

[32]  David J. DeWitt,et al.  Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[33]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[34]  Jian Pei,et al.  Utility-based anonymization using local recoding , 2006, KDD '06.

[35]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[36]  Latanya Sweeney,et al.  Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[37]  Chris Clifton,et al.  Multirelational k-Anonymity , 2009, IEEE Trans. Knowl. Data Eng..

[38]  Hoeteck Wee,et al.  Toward Privacy in Public Databases , 2005, TCC.

[39]  K. Liu,et al.  Towards identity anonymization on graphs , 2008, SIGMOD Conference.

[40]  Bradley Malin,et al.  Technical Evaluation: An Evaluation of the Current State of Genomic Data Privacy Protection Technology and a Roadmap for the Future , 2004, J. Am. Medical Informatics Assoc..

[41]  Chris Clifton,et al.  Thoughts on k-Anonymization , 2006, ICDE Workshops.

[42]  Lise Getoor,et al.  Link mining: a survey , 2005, SKDD.

[43]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[44]  Jiawei Han,et al.  gSpan: graph-based substructure pattern mining , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[45]  John Scott What is social network analysis , 2010 .

[46]  J. Gross,et al.  Graph Theory and Its Applications , 1998 .

[47]  Dan Suciu,et al.  A formal analysis of information disclosure in data exchange , 2004, SIGMOD '04.

[48]  Alexandre V. Evfimievski,et al.  Limiting privacy breaches in privacy preserving data mining , 2003, PODS.