See the Tree Through the Lines: The Shazoo Algorithm

Predicting the nodes of a given graph is a fascinating theoretical problem with applications in several domains. Since graph sparsification via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully satisfactory algorithm is not available yet. We fill this hole and introduce an efficient node predictor, SHAZOO, which is nearly optimal on any weighted tree. Moreover, we show that SHAZOO can be viewed as a common nontrivial generalization of both previous approaches for unweighted trees and weighted lines. Experiments on real-world datasets confirm that SHAZOO performs well in that it fully exploits the structure of the input tree, and gets very close to (and sometimes better than) less scalable energy minimization methods.

[1]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[2]  Claudio Gentile,et al.  Random Spanning Trees and the Prediction of Weighted Graphs , 2010, ICML.

[3]  Giovanni Rinaldi,et al.  Easy and difficult objective functions for max cut , 2003, Math. Program..

[4]  Noga Alon,et al.  Many random walks are faster than one , 2007, SPAA '08.

[5]  Mark Herbster,et al.  Fast Prediction on a Tree , 2008, NIPS.

[6]  Claudio Gentile,et al.  Fast and Optimal Prediction on a Labeled Tree , 2009, COLT.

[7]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[8]  Vipin Kumar,et al.  Association analysis-based transformations for protein interaction networks: a function prediction case study , 2007, KDD '07.

[9]  Alexander Zien,et al.  Label Propagation and Quadratic Criterion , 2006 .

[10]  Nikhil Srivastava,et al.  Graph sparsification by effective resistances , 2008, SIAM J. Comput..

[11]  Claudio Gentile,et al.  Active Learning on Trees and Graphs , 2010, COLT.

[12]  Zoubin Ghahramani,et al.  Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions , 2003, ICML.

[13]  Guy Lever,et al.  Predicting the Labelling of a Graph via Minimum $p$-Seminorm Interpolation , 2009, COLT.

[14]  David Bruce Wilson,et al.  Generating random spanning trees more quickly than the cover time , 1996, STOC '96.

[15]  Claudio Altafini,et al.  Monotonicity, frustration, and ordered response: an analysis of the energy landscape of perturbed large-scale biological networks , 2010, BMC Systems Biology.

[16]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[17]  Guy Lever,et al.  Online Prediction on Large Diameter Graphs , 2008, NIPS.