Learning Data Dependency with Communication Cost

In this paper, we consider the problem of recovering a graph that represents the statistical data dependency among nodes for a set of data samples generated by nodes, which provides the basic structure to perform an inference task, such as MAP (maximum a posteriori). This problem is referred to as structure learning. When nodes are spatially separated in different locations, running an inference algorithm requires a non-negligible amount of message passing, incurring some communication cost. We inevitably have the trade-off between the accuracy of structure learning and the cost we need to pay to perform a given message-passing based inference task because the learnt edge structures of data dependency and physical connectivity graph are often highly different. In this paper, we formalize this trade-off in an optimization problem which outputs the data dependency graph that jointly considers learning accuracy and message-passing cost. We focus on a distributed MAP as the target inference task due to its popularity, and consider two different implementations, ASYNC-MAP and SYNC-MAP that have different message-passing mechanisms and thus different cost structures. In ASYNC-MAP, we propose a polynomial time learning algorithm that is optimal, motivated by the problem of finding a maximum weight spanning tree. In SYNC-MAP, we first prove that it is NP-hard and propose a greedy heuristic. For both implementations, we then quantify how the probability that the resulting data graphs from those learning algorithms differ from the ideal data graph decays as the number of data samples grows, using the large deviation principle, where the decaying rate is characterized by some topological structures of both original data dependency and physical connectivity graphs as well as the degree of the trade-off, which provides some guideline on how many samples are necessary to obtain a certain learning accuracy. We validate our theoretical findings through extensive simulations, which confirm that it has a good match.

[1]  Justin Dauwels,et al.  On Variational Message Passing on Factor Graphs , 2007, 2007 IEEE International Symposium on Information Theory.

[2]  Tommi S. Jaakkola,et al.  Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations , 2007, NIPS.

[3]  Eytan Domany,et al.  On the Number of Samples Needed to Learn the Correct Structure of a Bayesian Network , 2006, UAI.

[4]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[5]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[6]  Carlos Guestrin,et al.  A robust architecture for distributed inference in sensor networks , 2005, IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005..

[7]  Pramod K Varshney,et al.  Distributed inference in wireless sensor networks , 2012, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[8]  Nanning Zheng,et al.  Tracking Multiple Visual Targets via Particle-Based Belief Propagation , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[9]  Nils F. Sandell,et al.  Distributed data association for Multi-target tracking in sensor networks , 2008, 2008 47th IEEE Conference on Decision and Control.

[10]  Sanjoy Dasgupta,et al.  Learning Polytrees , 1999, UAI.

[11]  Pieter Abbeel,et al.  Learning Factor Graphs in Polynomial Time and Sample Complexity , 2006, J. Mach. Learn. Res..

[12]  Guy Bresler,et al.  Efficiently Learning Ising Models on Arbitrary Graphs , 2014, STOC.

[13]  Wei Zhao,et al.  Energy-Efficient and Robust In-Network Inference in Wireless Sensor Networks , 2015, IEEE Transactions on Cybernetics.

[14]  M.J. Wainwright,et al.  Robust Message-Passing for Statistical Inference in Sensor Networks , 2007, 2007 6th International Symposium on Information Processing in Sensor Networks.

[15]  G. Parmigiani Large Deviation Techniques in Decision, Simulation and Estimation , 1992 .

[16]  John W. Fisher,et al.  Nonparametric belief propagation for self-localization of sensor networks , 2005, IEEE Journal on Selected Areas in Communications.

[17]  J.-F. Chamberland,et al.  Wireless Sensors in Distributed Detection Applications , 2007, IEEE Signal Processing Magazine.

[18]  A. Rényi,et al.  On the height of trees , 1967, Journal of the Australian Mathematical Society.

[19]  Lang Tong,et al.  A Large-Deviation Analysis of the Maximum-Likelihood Learning of Markov Tree Structures , 2009, IEEE Transactions on Information Theory.

[20]  José M. F. Moura,et al.  Distributing the Kalman Filter for Large-Scale Systems , 2007, IEEE Transactions on Signal Processing.

[21]  Mehul Motani,et al.  Optimizing Graphical Model Structure for Distributed Inference in Wireless Sensor Networks , 2016, 2016 13th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON).

[22]  John W. Fisher,et al.  Loopy Belief Propagation: Convergence and Effects of Message Errors , 2005, J. Mach. Learn. Res..

[23]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[24]  Ian McGraw,et al.  Residual Belief Propagation: Informed Scheduling for Asynchronous Message Passing , 2006, UAI.

[25]  Martin J. Wainwright,et al.  MAP estimation via agreement on trees: message-passing and linear programming , 2005, IEEE Transactions on Information Theory.

[26]  Usman A. Khan,et al.  Graph-Theoretic Distributed Inference in Social Networks , 2014, IEEE Journal of Selected Topics in Signal Processing.

[27]  Asuman E. Ozdaglar,et al.  Opinion Dynamics and Learning in Social Networks , 2010, Dyn. Games Appl..

[28]  Michael I. Jordan,et al.  Learning with Mixtures of Trees , 2001, J. Mach. Learn. Res..

[29]  Martin J. Wainwright,et al.  Tree-based reparameterization framework for analysis of sum-product and related algorithms , 2003, IEEE Trans. Inf. Theory.

[30]  Anima Anandkumar,et al.  Learning Mixtures of Tree Graphical Models , 2012, NIPS.

[31]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[32]  William T. Freeman,et al.  On the optimality of solutions of the max-product belief-propagation algorithm in arbitrary graphs , 2001, IEEE Trans. Inf. Theory.

[33]  Alan S. Willsky,et al.  Distributed data association for multi-target tracking in sensor networks , 2005 .

[34]  Alan S. Willsky,et al.  Inference with Minimal Communication: a Decision-Theoretic Variational Approach , 2005, NIPS.

[35]  A.S. Willsky,et al.  Distributed fusion in sensor networks , 2006, IEEE Signal Processing Magazine.

[36]  Xinbing Wang,et al.  De-anonymization of Social Networks with Communities: When Quantifications Meet Algorithms , 2017, ArXiv.