Secure construction and publication of contingency tables from distributed data

Contingency tables are widely used in many fields to analyze the relationship or infer the association between two or more variables. Indeed, due to their simplicity and ease, they are one of the first methods used to analyze gathered data. Typically, the construction of contingency tables from source data is considered straightforward since all data is supposed to be aggregated at a single party. However, in many cases, the collected data may actually be federated among different parties. While construction of the global contingency tables would still be of immense interest, privacy and security concerns may restrict the data owners from free sharing of the raw data. In this paper, we propose techniques for enabling secure construction of contingency tables from both horizontally and vertically partitioned data. Our methods are efficient and secure. We also examine cases where the constructed contingency table may itself leak too much information and discuss potential solutions. In order to protect certain sensitive cell values against being inferred from the marginal totals of a constructed contingency table, we further address the problem of how to securely publish the marginal totals.