Web Honeypots for Spies

We are building honeypots for document-collecting spies who are searching the Web for intelligence information. The goal is to develop tools for assessing the relative degree of interest elicited by users in representative documents. One experiment set up a site with bait documents and used two site-monitoring tools, Google Analytics and AWStats, to analyze the traffic. Much of this traffic was automated ("bots"), and showed some interesting differences in the retrieval frequency of documents. We also analyzed bot traffic on a similar real site, the library site at our school. In nearly one million requests, we concluded 64% were bots. 46 did identify themselves as bots, 40 came from blacklisted sites, and 12 gave demonstrably false user identifications. Requestors appeared to prefer documents to other types of files and 40% of the requests did not respect the terms of service on access provided by a robots. txt file.

[1]  Swapna S. Gokhale,et al.  A comparison of Web robot and human requests , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[2]  Matthieu Herrb,et al.  Set-up and deployment of a high-interaction honeypot: experiment and lessons learned , 2011, Journal in Computer Virology.

[3]  Swapna S. Gokhale,et al.  Web robot detection techniques: overview and limitations , 2010, Data Mining and Knowledge Discovery.

[4]  Euripides G. M. Petrakis,et al.  Improving the performance of focused web crawlers , 2009, Data Knowl. Eng..

[5]  B. Cheswick An Evening with Berferd In Which a Cracker is Lured, Endured, and Studied , 1997 .

[6]  Baskoro Adi Pratomo,et al.  Aggressive web application honeypot for exposing attacker's identity , 2014, 2014 The 1st International Conference on Information Technology, Computer, and Electrical Engineering.

[7]  Ehab Al-Shaer,et al.  Honeypot Deception Tactics , 2019, Autonomous Cyber Deception.

[8]  Bambang Sugiarto What is Google Analytics , 2018 .

[9]  Blake T Henderson A Honeypot For Spies: Understanding Internet-Based Data Theft , 2018 .

[10]  Anjali Sardana,et al.  Honeypots: A New Paradigm to Information Security , 2011 .

[11]  Derek Doran,et al.  Detection, Classification, and Workload Analysis of Web Robots , 2014 .

[12]  Oguzhan Alagöz,et al.  Modeling secrecy and deception in a multiple-period attacker-defender signaling game , 2010, Eur. J. Oper. Res..

[13]  Jerri L. Ledford,et al.  Google Analytics , 2006 .

[14]  Wu Di,et al.  Study on SEO monitoring system based on keywords & links , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[15]  Ben Whitham AUTOMATING THE GENERATION OF FAKE DOCUMENTS TO DETECT NETWORKINTRUDERS , 2013 .

[16]  Marios D. Dikaiakos,et al.  Web robot detection: A probabilistic reasoning approach , 2009, Comput. Networks.

[17]  L. Spitzner,et al.  Honeypots: Tracking Hackers , 2002 .