A parallel data generator for efficiently generating “realistic” social streams

A social stream refers to the data stream that records a series of social entities and the dynamic interactions between two entities. It can be employed to model the changes of entity states in numerous applications. The social streams, the combination of graph and streaming data, pose great challenge to efficient analytical query processing, and are key to better understanding users’ behavior. Considering of privacy and other related issues, a social stream generator is of great significance. A framework of synthetic social stream generator (SSG) is proposed in this paper. The generated social streams using SSG can be tuned to capture several kinds of fundamental social stream properties, including patterns about users’ behavior and graph patterns. Extensive empirical studies with several real-life social stream data sets show that SSG can produce data that better fit to real data. It is also confirmed that SSG can generate social stream data continuously with stable throughput and memory consumption. Furthermore, we propose a parallel implementation of SSG with the help of asynchronized parallel processing model and delayed update strategy. Our experiments verify that the throughput of the parallel implementation can increase linearly by increasing nodes.

[1]  Aravind Srinivasan,et al.  Modelling disease outbreaks in realistic urban social networks , 2004, Nature.

[2]  G. Madey,et al.  Uncovering individual and collective human dynamics from mobile phone records , 2007, 0710.2939.

[3]  Sheldon M. Ross,et al.  Introduction to probability models , 1975 .

[4]  Andy B. Yoo,et al.  Parallel Generation of Massive Scale-Free Graphs , 2010, ArXiv.

[5]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[6]  Tao Zhou,et al.  Empirical analysis on temporal statistics of human correspondence patterns , 2008 .

[7]  M. E. J. Newman,et al.  Power laws, Pareto distributions and Zipf's law , 2005 .

[8]  Eli Upfal,et al.  Using PageRank to Characterize Web Structure , 2002, COCOON.

[9]  S. Bornholdt,et al.  Scale-free topology of e-mail networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[10]  Hector Garcia-Molina,et al.  Estimating frequency of change , 2003, TOIT.

[11]  Ben Y. Zhao,et al.  Measurement-calibrated graph models for social network experiments , 2010, WWW '10.

[12]  L. Amaral,et al.  On Universality in Human Correspondence Activity , 2009, Science.

[13]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[14]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[15]  Raghu Ramakrishnan,et al.  Feeding frenzy: selectively materializing users' event feeds , 2010, SIGMOD Conference.

[16]  Walter Willinger,et al.  The origin of power laws in Internet topologies revisited , 2002, Proceedings.Twenty-First Annual Joint Conference of the IEEE Computer and Communications Societies.

[17]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[18]  Peter A. Boncz,et al.  S3G2: A Scalable Structure-Correlated Social Graph Generator , 2012, TPCTC.

[19]  Jari Saramäki,et al.  Temporal Networks , 2011, Encyclopedia of Social Network Analysis and Mining.

[20]  Walter Quattrociocchi,et al.  Selection in scientific networks , 2010, Social Network Analysis and Mining.

[21]  Sheldon M. Ross,et al.  Introduction to Probability Models (4th ed.). , 1990 .

[22]  H. Simon,et al.  ON A CLASS OF SKEW DISTRIBUTION FUNCTIONS , 1955 .

[23]  F. Chung,et al.  Connected Components in Random Graphs with Given Expected Degree Sequences , 2002 .

[24]  Madhav V. Marathe,et al.  Distributed-memory parallel algorithms for generating massive scale-free networks using preferential attachment model , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[25]  Behrouz Minaei-Bidgoli,et al.  ROLL: Fast In-Memory Generation of Gigantic Scale-free Networks , 2016, SIGMOD Conference.

[26]  Aoying Zhou,et al.  Towards modeling popularity of microblogs , 2013, Frontiers of Computer Science.

[27]  Christos Faloutsos,et al.  The "DGX" distribution for mining massive, skewed data , 2001, KDD '01.

[28]  Yun Chi,et al.  Monitoring RSS Feeds Based on User Browsing Pattern , 2007, ICWSM.

[29]  Tamara G. Kolda,et al.  An In-depth Study of Stochastic Kronecker Graphs , 2011, 2011 IEEE 11th International Conference on Data Mining.

[30]  A. Barabasi,et al.  Evolution of the social network of scientific collaborations , 2001, cond-mat/0104162.

[31]  S. Hakimi On Realizability of a Set of Integers as Degrees of the Vertices of a Linear Graph. I , 1962 .

[32]  S. Redner,et al.  Connectivity of growing random networks. , 2000, Physical review letters.

[33]  Christos Faloutsos,et al.  Scalable modeling of real graphs using Kronecker multiplication , 2007, ICML '07.

[34]  Hyun-Kyu Cho,et al.  Efficient Monitoring Algorithm for Fast News Alerts , 2007, IEEE Transactions on Knowledge and Data Engineering.

[35]  J. Coan,et al.  Social Baseline Theory: The Social Regulation of Risk and Effort. , 2015, Current opinion in psychology.

[36]  David F. Gleich,et al.  Moment-Based Estimation of Stochastic Kronecker Graph Parameters , 2011, Internet Math..

[37]  Bruno Gonçalves,et al.  Human dynamics revealed through Web analytics , 2008, Physical review. E, Statistical, nonlinear, and soft matter physics.

[38]  Wolfgang Kellerer,et al.  Outtweeting the Twitterers - Predicting Information Cascades in Microblogs , 2010, WOSN.

[39]  Jon M. Kleinberg,et al.  The Web as a Graph: Measurements, Models, and Methods , 1999, COCOON.

[40]  Jianjun Xie,et al.  Modeling microblogging communication based on human dynamics , 2011, 2011 Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD).

[41]  Aoying Zhou,et al.  On benchmarking online social media analytical queries , 2013, GRADES.

[42]  Shou-De Lin,et al.  Mining and generating large-scaled social networks via MapReduce , 2013, Social Network Analysis and Mining.

[43]  Pete Wyckoff,et al.  Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..

[44]  F. Chung,et al.  The average distances in random graphs with given expected degrees , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[45]  Hao Wang,et al.  Analysis of Large Multi-modal Social Networks: Patterns and a Generator , 2010, ECML/PKDD.

[46]  Tamara G. Kolda,et al.  A Scalable Generative Graph Model with Community Structure , 2013, SIAM J. Sci. Comput..

[47]  Scott Shenker,et al.  Shark: fast data analysis using coarse-grained distributed memory , 2012, SIGMOD Conference.

[48]  George Cybenko,et al.  How dynamic is the Web? , 2000, Comput. Networks.

[49]  P. Erdos,et al.  On the evolution of random graphs , 1984 .

[50]  Junghoo Cho,et al.  Topical semantics of twitter links , 2011, WSDM '11.

[51]  M. Newman,et al.  Random graphs with arbitrary degree distributions and their applications. , 2000, Physical review. E, Statistical, nonlinear, and soft matter physics.

[52]  Nicola Santoro,et al.  Time-Varying Graphs and Social Network Analysis: Temporal Indicators and Metrics , 2011, ArXiv.

[53]  Michalis Faloutsos,et al.  A simple conceptual model for the Internet topology , 2001, GLOBECOM'01. IEEE Global Telecommunications Conference (Cat. No.01CH37270).

[54]  William J. Stewart,et al.  Probability, Markov Chains, Queues, and Simulation: The Mathematical Basis of Performance Modeling , 2009 .

[55]  Mark E. J. Newman,et al.  Coauthorship and citation in scientific publishing , 2013, ArXiv.

[56]  Ravi Kumar,et al.  Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.

[57]  Christos Faloutsos,et al.  Graphs over time: densification laws, shrinking diameters and possible explanations , 2005, KDD '05.

[58]  Christos Faloutsos,et al.  Graph mining: Laws, generators, and algorithms , 2006, CSUR.

[59]  Song Yang,et al.  Networks: An Introduction by M. E. J. Newman , 2013 .

[60]  G Caldarelli,et al.  Invasion percolation and critical transient in the Barabási model of human dynamics. , 2007, Physical review letters.

[61]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[62]  Aoying Zhou,et al.  Social media data analysis for revealing collective behaviors , 2012, KDD.

[63]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[64]  Albert-László Barabási,et al.  The origin of bursts and heavy tails in human dynamics , 2005, Nature.

[65]  Christos Faloutsos,et al.  R-MAT: A Recursive Model for Graph Mining , 2004, SDM.

[66]  Alasdair Allan,et al.  Heterogenous telescope networks: An introduction , 2006 .

[67]  Andrei Z. Broder,et al.  Graph structure in the Web , 2000, Comput. Networks.

[68]  S. Strogatz Exploring complex networks , 2001, Nature.

[69]  Christos Faloutsos,et al.  RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[70]  B. Bollobás The evolution of random graphs , 1984 .

[71]  Albert,et al.  Topology of evolving networks: local events and universality , 2000, Physical review letters.

[72]  S. Redner How popular is your paper? An empirical study of the citation distribution , 1998, cond-mat/9804163.

[73]  Henning Meyerhenke,et al.  Fast generation of complex networks with underlying hyperbolic geometry , 2015 .

[74]  A. Barabasi,et al.  Dynamics of information access on the web. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[75]  Adilson E. Motter,et al.  A Poissonian explanation for heavy tails in e-mail communication , 2008, Proceedings of the National Academy of Sciences.

[76]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[77]  Hector Garcia-Molina,et al.  Synchronizing a database to improve freshness , 2000, SIGMOD '00.

[78]  BERNARD M. WAXMAN,et al.  Routing of multipoint connections , 1988, IEEE J. Sel. Areas Commun..

[79]  Surithong Srisa‐ard,et al.  Mining the Web: Discovering Knowledge from Hypertext Data , 2003 .

[80]  Patrick J. Wolfe,et al.  Subgraph Detection Using Eigenvector L1 Norms , 2010, NIPS.

[81]  Albert-László Barabási,et al.  Internet: Diameter of the World-Wide Web , 1999, Nature.

[82]  Rebecca N. Wright,et al.  A differentially private estimator for the stochastic Kronecker graph model , 2012, EDBT-ICDT '12.

[83]  Tamara G. Kolda,et al.  Community structure and scale-free collections of Erdös-Rényi graphs , 2011, Physical review. E, Statistical, nonlinear, and soft matter physics.

[84]  Christos Gkantsidis,et al.  Spectral analysis of Internet topologies , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[85]  M. Newman,et al.  Coauthorship and citation patterns in the Physical Review. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[86]  Jeffery R. Westbrook,et al.  A Functional Approach to External Graph Algorithms , 1998, ESA.

[87]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[88]  Josep-Lluís Larriba-Pey,et al.  Survey of Graph Database Performance on the HPC Scalable Graph Analysis Benchmark , 2010, WAIM Workshops.

[89]  Christos Faloutsos,et al.  Kronecker Graphs: An Approach to Modeling Networks , 2008, J. Mach. Learn. Res..

[90]  S. N. Dorogovtsev,et al.  Structure of growing networks with preferential linking. , 2000, Physical review letters.

[91]  Linyuan Lu,et al.  Random evolution in massive graphs , 2001 .

[92]  Afonso Ferreira,et al.  Building a reference combinatorial model for MANETs , 2004, IEEE Network.

[93]  Walter Willinger,et al.  Network topologies, power laws, and hierarchy , 2002, CCRV.

[94]  Pablo Rodriguez,et al.  The little engine(s) that could: scaling online social networks , 2012, TNET.

[95]  Timothy G. Armstrong,et al.  LinkBench: a database benchmark based on the Facebook social graph , 2013, SIGMOD '13.

[96]  Eli Upfal,et al.  Stochastic models for the Web graph , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[97]  Albert-László Barabási,et al.  Modeling bursts and heavy tails in human dynamics , 2005, Physical review. E, Statistical, nonlinear, and soft matter physics.

[98]  Fan Chung Graham,et al.  A random graph model for massive graphs , 2000, STOC '00.

[99]  A. Barabasi,et al.  Human dynamics: Darwin and Einstein correspondence patterns , 2005, Nature.

[100]  G. B. A. Barab'asi Competition and multiscaling in evolving networks , 2000, cond-mat/0011029.

[101]  Benjamin A. Miller,et al.  Goodness-of-fit statistics for anomaly detection in Chung-Lu random graphs , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[102]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[103]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[104]  S. N. Dorogovtsev,et al.  Evolution of networks , 2001, cond-mat/0106144.

[105]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[106]  Johan Bollen,et al.  Modeling Public Mood and Emotion: Twitter Sentiment and Socio-Economic Phenomena , 2009, ICWSM.

[107]  Tao Zhou,et al.  Modeling human dynamics with adaptive interest , 2007, 0711.0741.

[108]  Wang Bing-Hong,et al.  Heavy-Tailed Statistics in Short-Message Communication , 2009 .

[109]  Hassan Chafi,et al.  The LDBC Social Network Benchmark: Interactive Workload , 2015, SIGMOD Conference.

[110]  Alexei Vazquez Impact of memory on human dynamics , 2007 .

[111]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[112]  Béla Bollobás,et al.  Directed scale-free graphs , 2003, SODA '03.

[113]  Ines Gloeckner Mining The Web Discovering Knowledge From Hypertext Data , 2016 .

[114]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[115]  David M. Pennock,et al.  Winners don't take all: Characterizing the competition for links on the web , 2002, Proceedings of the National Academy of Sciences of the United States of America.