Developing a Dark Web collection and infrastructure for computational and social sciences

In recent years, there have been numerous studies from a variety of perspectives analyzing the Internet presence of hate and extremist groups. Yet the websites and forums of extremist and terrorist groups have long remained an underutilized resource for terrorism researchers due to their ephemeral nature and access and analysis problems. The purpose of the Dark Web archive is to provide a research infrastructure for use by social scientists, computer and information scientists, policy and security analysts, and others studying a wide range of social and organizational phenomena and computational problems. The Dark Web Forum Portal provides web enabled access to critical international jihadist and other extremist web forums. The focus of this paper is on the significant extensions to previous work including: increasing the scope of data collection, adding an incremental spidering component for regular data updates; enhancing the searching and browsing functions; enhancing multilingual machine-translation for Arabic, French, German and Russian; and advanced Social Network Analysis. A case study on identifying active participants is shown at the end.

[1]  Hsinchun Chen,et al.  Terrorism Knowledge Discovery Project: A Knowledge Discovery Approach to Addressing the Threats of Terrorism , 2004, ISI.

[2]  David Liben-Nowell,et al.  The link-prediction problem for social networks , 2007 .

[3]  Jay F. Nunamaker,et al.  Multilingual Web Retrieval: An Experiment on a Multilingual Business Intelligence Portal , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[4]  Hsinchun Chen,et al.  A focused crawler for Dark Web forums , 2010 .

[5]  Siddharth Kaza,et al.  Identifying significant facilitators of dark network evolution , 2009 .

[6]  Lada A. Adamic,et al.  Knowledge sharing and yahoo answers: everyone knows something , 2008, WWW.

[7]  Li Fan,et al.  Dark web forums portal: Searching and analyzing jihadist forums , 2009, 2009 IEEE International Conference on Intelligence and Security Informatics.

[8]  G. Weimann www.terror.net – How Modern Terrorism Uses the Internet , 2004 .

[9]  Steven Coll,et al.  Terrorists Turn to the Web as Base of Operations , 2005 .

[10]  Mark S. Ackerman,et al.  Expertise networks in online communities: structure and algorithms , 2007, WWW '07.

[11]  H. Milward,et al.  Dark Networks as Problems , 2003 .

[12]  Fah-Chun Cheong Internet Agents: Spiders, Wanderers, Brokers, and 'Bots , 1996 .

[13]  Sriram Raghavan,et al.  Crawling the Hidden Web , 2001, VLDB.

[14]  Gueorgi Kossinets,et al.  Empirical Analysis of an Evolving Social Network , 2006, Science.

[15]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[16]  Hector Garcia-Molina,et al.  The Evolution of the Web and Implications for an Incremental Crawler , 2000, VLDB.