Efficient Use of Differentially Private Binary Trees

Binary trees can be made differentially private by adding noise to every node and leaf. In such form they allow multifaceted exploration of a variable without revealing any individual information. While a differentially private binary tree can be used and read just like its conventional exact-valued analog, realizing that different combinations of nodes contain overlapping answers to the same information allows us to bring the statistical properties of multiple measurements under measurement error to noisy binary trees to create statistically efficient node estimates. We construct estimators that correctly use all available information in the tree, thus decreasing the error of nodes by up to eighty percent for the same level of privacy protection. Differentially private binary trees are important summary statistics for a broad variety of uses and algorithms. They are central to the algorithm of Dwork et al (2010) for releasing private streaming data, and used in numerous adaptations of this problem, such as Chan et al (2012), Cao et al (2013), and Thakurta and Smith (2013). A binary tree can be used to compose probability and cumulative densities of variables, range queries, as well as means, medians, modes and variances by Monte Carlo integration. Thus the release of a private binary tree can be a broadly useful privacy preserving means of allowing exploration of a variable, as for example in the statistical release of a non-interactive curator (Dwork and Smith, 2009). Due to the importance of binary trees, several heuristic adaptations of their use have been studied that bring about some improved accuracy for the same level of privacy guarantee (Hay et al 2010, Xu et al 2012, and relatedly Xiao et al 2010). We show here how to derive an optimally efficient use of a private binary tree, in the precise sense of providing minimum variance unbiased estimates. That is, we show how to refine a private tree in a manner that makes full and optimal use of all the information the tree contains. This is accomplished by linking the tree structure to the known statistical properties of multiple measures under measurement error. 1 Problem Statements Consider a perfect binary tree, in which every node, ti, is the sum of all leaves below that node, plus a random draw, i from some fixed distribution f(.), constructed to guarantee differential privacy. As represented below, the true values are denoted a through h at the leaves, the differentially private value revealed at any node is given to the right, and the notation for index i both numbers the nodes sequentially, and also describes the path from the top to reach that particular node, as a sequence of left (0) and right (1) progressions.