Graphs and privacy

Organizations that disseminate data provide researchers with the ability to study and, hopefully, improve aspects of the world around us. Examples of such organizations are the Census Bureau, and health care organizations. The data that they collect frequently contains information about specific individuals, and while this enables us to study characteristics of the population, it also opens up the potential for abuse: an adversary could discover sensitive information (such as disease and income) about specific individuals. Therefore measures must be taken to limit disclosure of confidential information about individuals while preserving the utility of the data for studying the population. This thesis explores how graph structures can be exploited to create anonymized datasets with high utility. We first examine the case where the data consists of a single table. Using ideas from graphical and loglinear models, we show how to produce anonymized data with significantly higher utility. We then initiate a study of privacy-preserving data publishing for relational and self-linking data which is a new direction that, to the best of our knowledge, has not been studied in the literature.