论文信息 - Two 1%s Don't Make a Whole: Comparing Simultaneous Samples from Twitter's Streaming API

Two 1%s Don't Make a Whole: Comparing Simultaneous Samples from Twitter's Streaming API

We compare samples of tweets from the Twitter Streaming API constructed from different connections that tracked the same popular keywords at the same time. We find that on average, over 96% of the tweets seen in one sample are seen in all others. Those tweets found only in a subset of samples do not significantly differ from tweets found in all samples in terms of user popularity or tweet structure. We conclude they are likely the result of a technical artifact rather than any systematic bias.

Kathleen M. Carley | Kenneth Joseph | Peter Landwehr | K. Joseph | Peter Landwehr

[1] Rui Li,et al. Towards Social Data Platform: Automatic Topic-focused Monitor for Twitter Stream , 2013, Proc. VLDB Endow..

[2] Nedjeljko Frančula. The National Academies Press , 2013 .

[3] D. Boyd,et al. CRITICAL QUESTIONS FOR BIG DATA , 2012 .

[4] Huan Liu,et al. Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[5] Ning Wang,et al. Assessing the Bias in Communication Networks Sampled from Twitter , 2012, ArXiv.

[6] Krishna P. Gummadi,et al. On sampling the wisdom of crowds: random vs. expert sampling of the twitter stream , 2013, CIKM.

[7] Duncan J. Watts,et al. Everyone's an influencer: quantifying influence on twitter , 2011, WSDM '11.

[8] Duncan J. Watts,et al. Who says what to whom on twitter , 2011, WWW.

[9] Leysia Palen,et al. Microblogging during two natural hazards events: what twitter may contribute to situational awareness , 2010, CHI.