OSoMe: the IUNI observatory on social media

1 The study of social phenomena is becoming increasingly reliant on big data from on2 line social networks. Broad access to social media data, however, requires software 3 development skills that not all researchers possess. Here we present the IUNI Observa4 tory on Social Media, an open analytics platform designed to facilitate computational 5 social science. The system leverages a historical, ongoing collection of over 70 billion 6 public messages from Twitter. We illustrate a number of interactive open-source tools 7 to retrieve, visualize, and analyze derived data from this collection. The Observatory, 8 now available at osome.iuni.iu.edu, is the result of a large, six-year collaborative effort 9 coordinated by the Indiana University Network Science Institute. 10 PeerJ Preprints | https://doi.org/10.7287/peerj.preprints.2008v1 | CC-BY 4.0 Open Access | rec: 29 Apr 2016, publ: 29 Apr 2016

[1]  Bruce G. Link,et al.  Public conceptions of mental illness: labels, causes, dangerousness, and social distance. , 1999, American journal of public health.

[2]  Filippo Menczer,et al.  Supporting a Social Media Observatory with Customizable Index Structures: Architecture and Performance , 2014, Cloud Computing for Data-Intensive Applications.

[3]  Filippo Menczer,et al.  Topicality and Impact in Social Media: Diverse Messages, Focused Messengers , 2014, PloS one.

[4]  Jacob Ratkiewicz,et al.  Political Polarization on Twitter , 2011, ICWSM.

[5]  H. Varian,et al.  Predicting the Present with Google Trends , 2012 .

[6]  Satoru Kawai,et al.  An Algorithm for Drawing General Undirected Graphs , 1989, Inf. Process. Lett..

[7]  Filippo Menczer,et al.  Virality Prediction and Community Structure in Social Networks , 2013, Scientific Reports.

[8]  Alessandro Flammini,et al.  Optimal network clustering for information diffusion , 2014, Physical review letters.

[9]  E. Vayena,et al.  Opinion: Learning as we go: Lessons from the publication of Facebook’s social-computing research , 2014, Proceedings of the National Academy of Sciences.

[10]  Marco Rosa,et al.  Four degrees of separation , 2011, WebSci '12.

[11]  Filippo Menczer,et al.  The production of information in the attention economy , 2014, Scientific Reports.

[12]  J. Bollen,et al.  More Tweets, More Votes: Social Media as a Quantitative Indicator of Political Behavior , 2013, PloS one.

[13]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[14]  Jonathan Levin,et al.  Economics in the age of big data , 2014, Science.

[15]  Amos Azaria,et al.  The DARPA Twitter Bot Challenge , 2016, Computer.

[16]  Judy Qiu,et al.  Experimenting lucene index on HBase in an HPC environment , 2011, HPCDB '11.

[17]  Jeffrey Heer,et al.  Divided Edge Bundling for Directional Network Data , 2011, IEEE Transactions on Visualization and Computer Graphics.

[18]  Marián Boguñá,et al.  Extracting the multiscale backbone of complex weighted networks , 2009, Proceedings of the National Academy of Sciences.

[19]  E. Hargittai Is Bigger Always Better? Potential Biases of Big Data Derived from Social Network Sites , 2015 .

[20]  Melvin R. Gibson The Public Health , 1911, The Hospital.

[21]  Eric Horvitz,et al.  Predicting Depression via Social Media , 2013, ICWSM.

[22]  Michael C. Frank,et al.  Estimating the reproducibility of psychological science , 2015, Science.

[23]  Marcel Salathé,et al.  Ethical Challenges of Big Data in Public Health , 2015, PLoS Comput. Biol..

[24]  Filippo Menczer,et al.  Scalable Query and Analysis for Social Networks: An Integrated High-Level Dataflow System with Pig and Harp , 2015 .

[25]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[26]  Jacob Ratkiewicz,et al.  Truthy: mapping the spread of astroturf in microblog streams , 2010, WWW.

[27]  James Moody,et al.  Data Visualization in Sociology. , 2014, Annual review of sociology.

[28]  Xiaoming Gao Supporting End-to-End Social Media Data Analysis with the IndexedHBase Platform , 2013 .

[29]  Filippo Menczer,et al.  The rise of social bots , 2014, Commun. ACM.

[30]  Filippo Menczer,et al.  The Digital Evolution of Occupy Wall Street , 2013, PloS one.

[31]  Geoffrey C. Fox,et al.  Towards an Understanding of Facets and Exemplars of Big Data Applications , 2014 .

[32]  A. Vespignani,et al.  Competition among memes in a world with limited attention , 2012, Scientific Reports.

[33]  Alessandro Vespignani,et al.  Large scale networks fingerprinting and visualization using the k-core decomposition , 2005, NIPS.

[34]  Nazanin Andalibi,et al.  Depression-related Imagery on Instagram , 2015, CSCW Companion.

[35]  J. Qiu 1 Towards HPC-ABDS : An Initial High-Performance Big Data Stack , 2014 .

[36]  Stanley Milgram,et al.  An Experimental Study of the Small World Problem , 1969 .

[37]  Stephanie L Harriman,et al.  Who is responsible for ethics approval of internet-based research ? , 2014 .

[38]  Filippo Menczer,et al.  Evolution of online user behavior during a social upheaval , 2014, WebSci '14.

[39]  Filippo Menczer,et al.  Predicting Successful Memes Using Network and Community Structure , 2014, ICWSM.

[40]  Jacob Ratkiewicz,et al.  Detecting and Tracking Political Abuse in Social Media , 2011, ICWSM.

[41]  Judy Qiu,et al.  A Tale of Two Data-Intensive Paradigms: Applications, Abstractions, and Architectures , 2014, 2014 IEEE International Congress on Big Data.

[42]  Filippo Menczer,et al.  The Geospatial Characteristics of a Social Movement Communication Network , 2013, PloS one.

[43]  Dikshant Shahi Apache Solr , 2015, Apress.

[44]  Stanley Milgram,et al.  An Experimental Study of the Small World Problem , 1969 .

[45]  Jacob Ratkiewicz,et al.  Predicting the Political Alignment of Twitter Users , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[46]  Konstantina Papagiannaki The power of joint multiuser beamforming: technical perspective , 2014, CACM.

[47]  Filippo Menczer,et al.  Truthy: enabling the study of online social networks , 2012, CSCW '13.

[48]  Pietro Terna Simulation tools for social scientists: Building agent-based models with SWARM , 1998, J. Artif. Soc. Soc. Simul..

[49]  Stuart K. Card Information visualization and information foraging , 1996, AVI '96.

[50]  Jeffrey T. Hancock,et al.  Experimental evidence of massive-scale emotional contagion through social networks , 2014, Proceedings of the National Academy of Sciences.

[51]  Filippo Menczer,et al.  Design and prototyping of a social media observatory , 2013, WWW.

[52]  Filippo Menczer,et al.  Fast filtering and animation of large dynamic networks , 2013, EPJ Data Science.

[53]  Lada A. Adamic,et al.  The role of social networks in information diffusion , 2012, WWW.

[54]  Judy Qiu,et al.  Supporting Queries and Analyses of Large-Scale Social Media Data with Customizable and Scalable Indexing Techniques over NoSQL Databases , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[55]  Judy Qiu,et al.  Parallel Clustering of High-Dimensional Social Media Data Streams , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[56]  S. Fiske,et al.  Protecting human research participants in the age of big data , 2014, Proceedings of the National Academy of Sciences.

[57]  Michael Zimmer The Twitter Archive at the Library of Congress: Challenges for information practice and information policy , 2015, First Monday.

[58]  H. Varian,et al.  Predicting the Present with Google Trends , 2009 .

[59]  Jacob Ratkiewicz,et al.  Detecting and Tracking the Spread of Astroturf Memes in Microblog Streams , 2010, ArXiv.

[60]  G. N. Gilbert Computational Social Science , 2010 .

[61]  Filippo Menczer,et al.  Partisan asymmetries in online political activity , 2012, EPJ Data Science.

[62]  Jon Kleinberg,et al.  Differences in the mechanics of information diffusion across topics: idioms, political hashtags, and complex contagion on twitter , 2011, WWW.

[63]  Huan Liu,et al.  Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose , 2013, ICWSM.

[64]  Adrien Guille,et al.  Information diffusion in online social networks , 2013, SIGMOD'13 PhD Symposium.

[65]  Taha Yasseri,et al.  The distorted mirror of Wikipedia: a quantitative analysis of Wikipedia coverage of academics , 2013, EPJ Data Science.

[66]  Barbara Entwisle,et al.  Opinion: Building a 21st-century infrastructure for the social sciences , 2014, Proceedings of the National Academy of Sciences.

[67]  Michael J. Cafarella,et al.  Using Social Media to Measure Labor Market Flows , 2014 .

[68]  Filippo Menczer,et al.  Traveling trends: social butterflies or frequent fliers? , 2013, COSN '13.

[69]  Cécile Favre,et al.  Information diffusion in online social networks: a survey , 2013, SGMD.

[70]  Mika Raento,et al.  Smartphones , 2009 .

[71]  Filippo Menczer,et al.  BotOrNot: A System to Evaluate Social Bots , 2016, WWW.

[72]  Rossano Schifanella,et al.  The role of information diffusion in the evolution of social networks , 2013, KDD.

[73]  James A. Hendler,et al.  The web observatory extension: facilitating web science collaboration through semantic markup , 2014, WWW '14 Companion.

[74]  D. Ruths,et al.  Social media for large studies of behavior , 2014, Science.

[75]  Omer Tene Jules Polonetsky,et al.  Privacy in the Age of Big Data: A Time for Big Decisions , 2012 .

[76]  G. King,et al.  Ensuring the Data-Rich Future of the Social Sciences , 2011, Science.