1.Aggregation Strategy: The population of interest is all Canadians living in census subdivisions, as defined by Statistics Canada in [10], with populations of 5000 or more. Types of sub-divisions include Municipalities, Cities, Towns, Villages etc. Furthermore, the size of the population of interest is 29,383,430. Intuitively, our data consists of 29,383,430 records. Table 1 illustrates how a particular record would appear in the database. Methodology We found the partition that satisfies the optimization problem in section 3, for 0 0007 < T < 1. This was enough to show the distribution of the optimal node based on T for all possible values of T (0 to 1). Chart 1 depicts the optimal node based on the value of 100T. Hence, if we were given a threshold, we would be able to determine the optimal partition. An information loss measure was computed over the range of T in Chart 2. Results
[2]
Cheong-Ghil Kim,et al.
Protecting Privacy Using K-Anonymity with a Hybrid Search Scheme
,
2012
.
[3]
Jean-Pierre Corriveau,et al.
A globally optimal k-anonymity method for the de-identification of health data.
,
2009,
Journal of the American Medical Informatics Association : JAMIA.
[4]
Khaled El Emam,et al.
De-Identification Methods
,
2013
.
[5]
Khaled El Emam,et al.
Choosing Metric Thresholds
,
2013
.
[6]
Jian Pei,et al.
Utility-based anonymization using local recoding
,
2006,
KDD '06.
[7]
Khaled El Emam,et al.
Scope, Terminology, and Definitions
,
2013
.