Sampling dark networks to locate people of interest

Dark networks, which describe networks with covert entities and connections such as those representing illegal activities, are of great interest to intelligence analysts. However, before studying such a network, one must first collect appropriate network data. Collecting accurate network data in such a setting is a challenging task, as data collectors will make inferences, which may be incorrect, based on available intelligence data, which may itself be misleading. In this paper, we consider the problem of how to effectively sample dark networks, in which sampling queries may return incorrect information, with the specific goal of locating people of interest. We present RedLearn and RedLearnRS, two algorithms for crawling dark networks with the goal of maximizing the identification of nodes of interest, given a limited sampling budget. RedLearn assumes that a query on a node can accurately return whether a node represents a person of interest, while RedLearnRS dispenses with that assumption. We consider realistic error scenarios, which describe how individuals in a dark network may attempt to conceal their connections. We evaluate and present results on several real-world networks, including dark networks, as well as various synthetic dark network structures proposed in the criminology literature. Our analysis shows that RedLearn and RedLearnRS meet or outperform other sampling strategies.

[1]  Eric P. Xing,et al.  Network Completion and Survey Sampling , 2009, AISTATS.

[2]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[3]  Erik C. Rye,et al.  The Marginal Benefit of Monitor Placement on Networks , 2016, CompleNet.

[4]  Malcolm K. Sparrow,et al.  The application of network analysis to criminal intelligence: An assessment of the prospects , 1991 .

[5]  Zahy Bnaya Social Network Search as a Volatile Multi-armed Bandit Problem , 2013 .

[6]  D. M. Schwartz,et al.  Using social network analysis to target criminal networks , 2009 .

[7]  Donald F. Towsley,et al.  Pay few, influence most: Online myopic network covering , 2014, 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[8]  A. Fronczak,et al.  Biased random walks in complex networks: the role of local navigation rules. , 2007, Physical review. E, Statistical, nonlinear, and soft matter physics.

[9]  Talal Rahwan,et al.  Strategic Social Network Analysis , 2017, AAAI.

[10]  Tanya Y. Berger-Wolf,et al.  Online Sampling of High Centrality Individuals in Social Networks , 2010, PAKDD.

[11]  Альберт Николаевич Ширяев,et al.  Об одном эффективном случае решения задачи об оптимальной остановке для случайных блужданий@@@On an effective solution of the optimal stopping problem for random walks , 2004 .

[12]  V. Le Organised Crime Typologies: Structure, Activities and Conditions , 2012 .

[13]  Rami Puzis,et al.  TONIC: Target Oriented Network Intelligence Collection for the Social Web , 2013, AAAI.

[14]  Graham Cormode,et al.  Node Classification in Social Networks , 2011, Social Network Data Analytics.

[15]  Rami Puzis,et al.  Volatile Multi-Armed Bandits for Guaranteed Targeted Social Crawling , 2013, AAAI.

[16]  Lada A. Adamic,et al.  Search in Power-Law Networks , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  B. Bollobás The evolution of random graphs , 1984 .

[18]  Timothy Baldwin,et al.  Collective Classification of Congressional Floor-Debate Transcripts , 2011, ACL.

[19]  Jennifer Neville,et al.  Modeling relationship strength in online social networks , 2010, WWW '10.

[20]  A. Rbnyi ON THE EVOLUTION OF RANDOM GRAPHS , 2001 .

[21]  Beom Jun Kim,et al.  Growing scale-free networks with tunable clustering. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[22]  David J. Aldous,et al.  Lower bounds for covering times for reversible Markov chains and random walks on graphs , 1989 .

[23]  W. Baker,et al.  THE SOCIAL ORGANIZATION OF CONSPIRACY: ILLEGAL NETWORKS IN THE HEAVY ELECTRICAL EQUIPMENT INDUSTRY* , 1993 .

[24]  Christopher M. Danforth,et al.  Estimation of Global Network Statistics from Incomplete Data , 2014, PloS one.

[25]  Gang Wang,et al.  Crime data mining: a general framework and some examples , 2004, Computer.

[26]  William W. Cohen,et al.  Semi-Supervised Classification of Network Data Using Very Few Labels , 2010, 2010 International Conference on Advances in Social Networks Analysis and Mining.

[27]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[28]  Foster Provost,et al.  Suspicion scoring based on guilt-by-association, colle ctive inference, and focused data access 1 , 2005 .

[29]  S. Koschade A Social Network Analysis of Jemaah Islamiyah: The Applications to Counterterrorism and Intelligence , 2006 .

[30]  Lise Getoor,et al.  To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles , 2009, WWW '09.

[31]  Ryan Miller,et al.  Three is The Answer: Combining Relationships to Analyze Multilayered Terrorist Networks , 2017, 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[32]  John N. Tsitsiklis,et al.  Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives , 1999, IEEE Trans. Autom. Control..

[33]  Yong Lu,et al.  Social Network Analysis of a Criminal Hacker Community , 2010, J. Comput. Inf. Syst..

[34]  H. Milward,et al.  Dark Networks as Problems , 2003 .

[35]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[36]  Pivithuru Wijegunawardana,et al.  Seeing Red: Locating People of Interest in Networks , 2017 .

[37]  A. Asztalos,et al.  Network discovery by generalized random walks , 2010, 1008.4980.

[38]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[39]  Christos Faloutsos,et al.  Using ghost edges for classification in sparsely labeled networks , 2008, KDD.

[40]  Jennifer Neville,et al.  Iterative Classification in Relational Data , 2000 .

[41]  Guanhua Yan,et al.  Peri-Watchdog: Hunting for hidden botnets in the periphery of online social networks , 2013, Comput. Networks.

[42]  Christos Faloutsos,et al.  Sampling from large graphs , 2006, KDD '06.

[43]  William W. Cohen,et al.  On the collective classification of email "speech acts" , 2005, SIGIR '05.

[44]  P. Biernacki,et al.  Snowball Sampling: Problems and Techniques of Chain Referral Sampling , 1981 .

[45]  Heiko Rieger,et al.  Random walks on complex networks. , 2004, Physical review letters.

[46]  J. A. Bather,et al.  Oil exploration: sequential decisions in the face of uncertainty , 1988, Journal of Applied Probability.

[47]  Foster Provost,et al.  Suspicion scoring of networked entities based on guilt-by-association, collective inference, and focused data access 1 , 2005 .