Large-Scale Graph-based Transductive Inference

We consider the issue of scalability of graph-based semi-supervised learning (SSL) algorithms. In this context, we propose a fast graph node ordering algorithm that improves (parallel) spatial locality by being cache cognizant. This approach allows for a near linear speedup on a shared-memory parallel machine to be achievable, and thus means that graph-based SSL can scale to very large data sets. We use the above algorithm in a multi-threaded multi-core implementation to solve a SSL problem on a 120 million node graph in a reasonable amount of time.