Adaptive Crawling with Multiple Bots: A Matroid Intersection Approach

In this work, we examine the problem of adaptively uncovering network topology in an incomplete network, to support more accurate decision making in various real-world applications, such as modeling for reconnaissance attacks and network probing. While this problem has been partially studied, we provide a novel take on it by modeling it with a set of crawlers termed “bots” which can uncover independent portions of the network in parallel. Accordingly, we develop three adaptive algorithms, which make decisions based on previous observations due to incomplete information, namely AGP, a sequential method; FastAGP, a parallel algorithm; and ALSP, an extension of FastAGP uses local search to improve guarantees. These algorithms are proven to have 1/3, 1/7, and 1/ (5 + ∊) approximation ratios, respectively. The key analysis of these algorithms is the connection between adaptive algorithms and an intersection of multiple partition matroids. We conclude with an evaluation of these algorithms to quantify the impact of both adaptivity and parallelism. We find that in practice, adaptive approaches perform significantly better, while FastAGP performs nearly as well as AGP in most cases despite operating in a massively parallel fashion. Finally, we show that a balance between the quantity and quality of bots is ideal for maximizing observation of the network.

[1]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[2]  Dongho Won,et al.  A Practical Study on Advanced Persistent Threats , 2012 .

[3]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[4]  Konstantin Beznosov,et al.  The socialbot network: when bots socialize for fame and money , 2011, ACSAC '11.

[5]  Tina Eliassi-Rad,et al.  MaxOutProbe: An Algorithm for Increasing the Size of Partially Observed Networks , 2015, ArXiv.

[6]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[7]  Rami Puzis,et al.  Hunting organization-targeted socialbots , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[8]  B. Korte,et al.  Worst case analysis of greedy type algorithms for independence systems , 1980 .

[9]  Rami Puzis,et al.  Link Prediction in Highly Fractional Data Sets , 2013 .

[10]  Xiang Li,et al.  Adaptive Reconnaissance Attacks with Near-Optimal Parallel Batching , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[11]  Andreas Krause,et al.  Adaptive Submodular Optimization under Matroid Constraints , 2011, ArXiv.

[12]  Jan Vondrák,et al.  Submodular Maximization over Multiple Matroids via Generalized Exchange Properties , 2009, Math. Oper. Res..

[13]  Christos Faloutsos,et al.  Parallel crawling for online social networks , 2007, WWW '07.

[14]  Andreas Krause,et al.  Adaptive Submodularity: Theory and Applications in Active Learning and Stochastic Optimization , 2010, J. Artif. Intell. Res..

[15]  Tina Eliassi-Rad,et al.  ε - WGX: Adaptive Edge Probing for Enhancing Incomplete Networks , 2017, WebSci.

[16]  Jure Leskovec,et al.  {SNAP Datasets}: {Stanford} Large Network Dataset Collection , 2014 .

[17]  Xiang Li,et al.  Privacy Issues in Light of Reconnaissance Attacks with Incomplete Information , 2016, 2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI).

[18]  Alexander Schrijver,et al.  Combinatorial optimization. Polyhedra and efficiency. , 2003 .

[19]  Wei Chen,et al.  Efficient influence maximization in social networks , 2009, KDD.