Chappie Swarm: Persona-Driven Web Corpus Generation

A common issue amongst security researchers is the lack of publicly available network traffic traces. In this paper we present Chappie Swarm, which seeks to emulate human behavior in regard to internet browsing. The experimenter can unleash a number of automated chappies which will assume pre-defined personas, and then actively go out and query websites while simultaneously recording their browsing behavior, and saving the network trace as a packet capture file. Unlike other traffic generators, Chappie Swarm distinguishes itself fundamentally by utilizing this ”persona” approach, while also not needing to be ”seeded” by a previously recorded traffic capture.

[1]  Philip K. Chan,et al.  An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection , 2003, RAID.

[2]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[3]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[4]  Yuen-Hsien Tseng,et al.  Context-aware web security threat prevention , 2012, CCS.

[5]  Balachander Krishnamurthy,et al.  On the use and performance of content distribution networks , 2001, IMW '01.

[6]  Carrie Gates,et al.  Challenging the anomaly detection paradigm: a provocative discussion , 2006, NSPW '06.

[7]  Xuerui Wang,et al.  Click-Through Rate Estimation for Rare Events in Online Advertising , 2011 .

[8]  Sally Floyd,et al.  Difficulties in simulating the internet , 2001, TNET.

[9]  John C. Mitchell,et al.  How Good Are Humans at Solving CAPTCHAs? A Large Scale Evaluation , 2010, 2010 IEEE Symposium on Security and Privacy.

[10]  Steven M. Bellovin,et al.  A technique for counting natted hosts , 2002, IMW '02.

[11]  Geoff Huston,et al.  Anatomy A Look Inside Network Address Translators , 2004 .

[12]  Charles V. Wright,et al.  On Web Browsing Privacy in Anonymized NetFlows , 2007, USENIX Security Symposium.

[13]  Barry G. Silverman,et al.  More Realistic Human Behavior Models for Agents in Virtual Worlds: Emotion, Stress, and Value Ontologies , 2001 .

[14]  Avrim Blum,et al.  A Random-Surfer Web-Graph Model , 2006, ANALCO.

[15]  Wenke Lee,et al.  SURF: detecting and measuring search poisoning , 2011, CCS '11.

[16]  Páll Melsted,et al.  PageRank and the random surfer model , 2008, SODA '08.

[17]  Vijay H. Kothari,et al.  Validating an Agent-Based Model of Human Password Behavior , 2016, AAAI Workshop: Artificial Intelligence for Cyber Security.

[18]  Richard Lippmann,et al.  The 1999 DARPA off-line intrusion detection evaluation , 2000, Comput. Networks.

[19]  Fang Yu,et al.  Knowing your enemy: understanding and detecting malicious web advertising , 2012, CCS '12.

[20]  J. Blythe,et al.  A dual-process cognitive model for testing resilient control systems , 2012, 2012 5th International Symposium on Resilient Control Systems.

[21]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[22]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .