论文信息 - Slicing: A New Approach To Privacy Preserving Data Publishing Related To Medical Data-Base Using K-Means Clustering Technique

Slicing: A New Approach To Privacy Preserving Data Publishing Related To Medical Data-Base Using K-Means Clustering Technique

There exist several anonymities techniques, such as generalization and bucketization, which have been designed for privacy preserving data publishing. Recent work has shown that generalization loses considerable amount of information, the techniques, such as generalization, especially for high dimensional data. Bucketization on the other hand, does not prevent membership disclosure and does not apply for data that doesn’t have a clear operation between quasi-identifying attributes and sensitive attributes. In this paper, a technique called slicing, which partitions the data both horizontally and vertically. Here slicing preserves better data utility than generalization and can be used for membership disclosure protection. Another important advantage of slicing is that it can handle high-dimensional data. And how slicing can be used for attribute disclosure protection and develop an efficient algorithm for computing the sliced data. The workload experiments confirm that slicing preserves better utility than generalization and are more effective than bucketization and the workloads involving the sensitive attribute. This Experiment also demonstrates that slicing can be used to prevent membership disclosure. Using the concepts of clustering and classifying the data based on the distance measures. In this paper cardiologic database is considered for study. The developed model will be useful for Doctors or Para-medics to find out the patient’s level in the cardiologic disease, deduce the medicines required in seconds and propose them to the patient. In order to measure the reusability K-means clustering algorithm is used. INTRODUCTION: Privacy-preserving publishing of data has been studied extensively in recent the years. These data contains records each of which contains information about an individual entity, such as a person, a household, or an organization. There are several data anonymization techniques have been proposed. The most popular ones are generalization [10, 11] for k-anonymity [11] and bucketization [12, 14, 13]. In both approaches, attributes are partitioned into three categories: (1) some attributes are identifiers that can uniquely identify an individual, such as Name or Social Security Number; (2) some attributes are Quasi-Identifiers (QI), which the adversary may already know (possibly from other publicly-available databases) and which, when taken together, can potentially identify an individual, e.g., Birth Permission to make digital or hard copies of all or part of this work for personal or classroom use is gra without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific Permission and Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00.date, Sex, and Zip code; (3) some attributes are Sensitive Attributes (SAs), which are unknown to the adversary and are considered sensitive, Such as Disease and Salary. In both generalization and bucketization techniques, first removes the Identifiers from the data and then partitions tuples into buckets. The two techniques differ in the next step. Generalization transforms the QI-values in each bucket into “less specific but semantically consistent” values so that tuples in the same bucket cannot be distinguished by their QI values. In bucketization, one separates the SA from the QI by randomly permuting the SA values in each bucket. The anonym-zed data consists of a set of buckets with permuted sensitive attribute values. In the proposed article the considerable database of the heart patients to focus on the cardiologic situations. Reuse is vital in medical field because the previous information is very handy in deducing a patient’s current health position and save the precious life. CARDIOLOGY: Cardiology is a medical specialty dealing with human heart disorders. This field includes diagnosis and treatment of disorders like heart defects, heart failure and other heart diseases. According to World Health Organization, India has the highest number of coronary heart disease deaths in the world [2]. This can be deduced not only due to lack of resources but also due to concentration of resources at places like cities and towns. By usage of Internet and cardiology database component reuse, the Para-medics, can deduce the medicines or methods to be used for the patients at remote places to temporarily put them out of danger. From the reuse of available data, the required medicines may also be deduced and proposed to the patients. In this article the methodology using the clustering technique together with classification technique where the different diseases of patients’ data are clustered, depending on the health conditions. Future work, which is at a research stage now would be useful in aiding to the ailing patients and become an important part in the general usage of the Doctors. SLICING: In this section, an example is to illustrate a slicing. formalize slicing is compare it with generalization and bucketization, and discuss privacy threats that slicing can addresses .Table 1 shows an example original data table and its anonymities versions using various anonymization techniques. The original table is shown in Table 1(a). The 393 International Journal of Engineering Research & Technology (IJERT) Vol. 2 Issue 8, August 2013

D. Aruna Kumari

[1] Carlos Ordonez,et al. Clustering binary data streams with K-means , 2003, DMKD '03.

[2] ASHWIN MACHANAVAJJHALA,et al. L-diversity: privacy beyond k-anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[3] Cynthia Dwork,et al. Calibrating Noise to Sensitivity in Private Data Analysis , 2006, TCC.

[4] Kathrin Kirchner,et al. Reusable components for partitioning clustering algorithms , 2009, Artificial Intelligence Review.

[5] Yufei Tao,et al. Anatomy: simple and effective privacy preservation , 2006, VLDB.

[6] David J. DeWitt,et al. Mondrian Multidimensional K-Anonymity , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[7] Irit Dinur,et al. Revealing information while preserving privacy , 2003, PODS.

[8] Cynthia Dwork,et al. Differential Privacy , 2006, ICALP.

[9] Ashwin Machanavajjhala,et al. Worst-Case Background Knowledge for Privacy-Preserving Data Publishing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[10] Latanya Sweeney,et al. k-Anonymity: A Model for Protecting Privacy , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[11] Qing Zhang,et al. Aggregate Query Answering on Anonymized Tables , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[12] Latanya Sweeney,et al. Achieving k-Anonymity Privacy Protection Using Generalization and Suppression , 2002, Int. J. Uncertain. Fuzziness Knowl. Based Syst..