Finding ways to aggregate patient records so as to preserve patient privacy while also providing useful information to medical researchers

1.Aggregation Strategy: The population of interest is all Canadians living in census subdivisions, as defined by Statistics Canada in [10], with populations of 5000 or more. Types of sub-divisions include Municipalities, Cities, Towns, Villages etc. Furthermore, the size of the population of interest is 29,383,430. Intuitively, our data consists of 29,383,430 records. Table 1 illustrates how a particular record would appear in the database. Methodology We found the partition that satisfies the optimization problem in section 3, for 0 0007 < T < 1. This was enough to show the distribution of the optimal node based on T for all possible values of T (0 to 1). Chart 1 depicts the optimal node based on the value of 100T. Hence, if we were given a threshold, we would be able to determine the optimal partition. An information loss measure was computed over the range of T in Chart 2. Results