Fast Approximation Algorithms for Near-optimal Large-scale Network Monitoring

We study the problem of optimal traffic prediction and monitoring in large-scale networks. Our goal is to determine which subset of K links to monitor in order to "best" predict the traffic on the remaining links in the network. We consider several optimality criteria. This can be formulated as a combinatorial optimization problem, belonging to the family of subset selection problems. Similar NP-hard problems arise in statistics, machine learning and signal processing. Some include subset selection for regression, variable selection, and sparse approximation. Exact solutions are computationally prohibitive. We present both new heuristics as well as new efficient algorithms implementing the classical greedy heuristic - commonly used to tackle such combinatorial problems. Our approach exploits connections to principal component analysis (PCA), and yields new types of performance lower bounds which do not require submodularity of the objective functions. We show that an ensemble method applied to our new randomized heuristic algorithm, often outperforms the classical greedy heuristic in practice. We evaluate our algorithms under several large-scale networks, including real life networks.