The Big Digger & Puzzler System for Harvesting & Analyzing Data from Social Networks

The Big Digger & Puzzler is a distributed system for harvesting and analyzing data from social networks. The system is distributed on users’ PCs around the world. A Digger collects data as specified by the user; a Puzzler analyzes data collected by the Digger, and can contact other Puzzlers to request analytics data on-demand. Puzzlers don’t share the raw data with each other, only analytics: this cuts down on bandwidth- and storage requirements, and effectively distributes the compute workload across users interested in approximately the same topics and analytics. On a large enough scale, enough relevant analytics will be available to serve many different requests, statistically speaking.

[1]  Anja Feldmann,et al.  Proceedings of the 9th ACM SIGCOMM Conference on Internet Measurement 2009, Chicago, Illinois, USA, November 4-6, 2009 , 2009, IMC 2009.

[2]  Agata Fronczak,et al.  Average path length in random networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[3]  Lada A. Adamic,et al.  Zipf's law and the Internet , 2002, Glottometrics.

[4]  Jun Wang,et al.  TRIBLER: a social‐based peer‐to‐peer system , 2008, IPTPS.

[5]  David P. Anderson,et al.  BOINC: a system for public-resource computing and storage , 2004, Fifth IEEE/ACM International Workshop on Grid Computing.

[6]  Bo Pang,et al.  Thumbs up? Sentiment Classification using Machine Learning Techniques , 2002, EMNLP.

[7]  David A. Bader,et al.  Massive Social Network Analysis: Mining Twitter for Social Good , 2010, 2010 39th International Conference on Parallel Processing.

[8]  Alexandru Iosup,et al.  TRIBLER: a social-based peer-to-peer system: Research Articles , 2008 .

[9]  John Markus Bjørndalen,et al.  Embarrassingly Distributed Computing for Symbiotic Weather Forecasts , 2013, ICCS.

[10]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[11]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[12]  David R. Karger,et al.  Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the World Wide Web , 1997, STOC '97.

[13]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[14]  David R. Karger,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM '01.

[15]  Virgílio A. F. Almeida,et al.  Characterizing user behavior in online social networks , 2009, IMC '09.

[16]  Robert Morris,et al.  Chord: A scalable peer-to-peer lookup service for internet applications , 2001, SIGCOMM 2001.

[17]  Roy T. Fielding,et al.  Principled design of the modern Web architecture , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[18]  Chen-Nee Chuah,et al.  Unveiling facebook: a measurement study of social network based applications , 2008, IMC '08.