A survey on cost-effective context-aware distribution of social data streams over energy-efficient data centres

Abstract Social media have emerged in the last decade as a viable and ubiquitous means of communication. The ease of user content generation within these platforms, e.g. check-in information, multimedia data, etc., along with the proliferation of Global Positioning System (GPS)-enabled, always-connected capture devices lead to data streams of unprecedented amount and a radical change in information sharing. Social data streams raise a variety of practical challenges, including derivation of real-time meaningful insights from effectively gathered social information, as well as a paradigm shift for content distribution with the leverage of contextual data associated with user preferences, geographical characteristics and devices in general. In this article we present a comprehensive survey that outlines the state-of-the-art situation and organizes challenges concerning social media streams and the infrastructure of the data centres supporting the efficient access to data streams in terms of content distribution, data diffusion, data replication, energy efficiency and network infrastructure. We systematize the existing literature and proceed to identify and analyse the main research points and industrial efforts in the area as far as modelling, simulation and performance evaluation are concerned.

[1]  M. Draief,et al.  Placing dynamic content in caches with small population , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.

[2]  Maurizio Tomasella,et al.  Vision and Challenges for Realising the Internet of Things , 2010 .

[3]  Zhu Wang,et al.  Exploiting Personal and Community Context in Mobile Social Networks , 2014 .

[4]  Jorge Lobo,et al.  Learning Stochastic Models of Information Flow , 2012, 2012 IEEE 28th International Conference on Data Engineering.

[5]  Michael Abd-El-Malek,et al.  Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.

[6]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[7]  Carlo Curino,et al.  Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.

[8]  R. Want,et al.  System challenges for ubiquitous & pervasive computing , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[9]  Albert Y. Zomaya,et al.  Energy efficient utilization of resources in cloud computing systems , 2010, The Journal of Supercomputing.

[10]  Rini T. Kaushik,et al.  GreenHDFS: towards an energy-conserving, storage-efficient, hybrid Hadoop compute cluster , 2010 .

[11]  A. Banerjee,et al.  A Simple Model of Herd Behavior , 1992 .

[12]  Srinivas Shakkottai,et al.  Content Caching and Scheduling in Wireless Networks With Elastic and Inelastic Traffic , 2014, IEEE/ACM Transactions on Networking.

[13]  Jorge Ejarque,et al.  Dynamic energy-aware scheduling for parallel task-based application in cloud computing , 2018, Future Gener. Comput. Syst..

[14]  Christos Faloutsos,et al.  Patterns of Cascading Behavior in Large Blog Graphs , 2007, SDM.

[15]  Tomasz Łuczak,et al.  Size and connectivity of the k-core of a random graph , 1991 .

[16]  Louis Plissonneau,et al.  Mobile data traffic analysis: How do you prefer watching videos? , 2010, 2010 22nd International Teletraffic Congress (lTC 22).

[17]  Ramanathan V. Guha,et al.  Information diffusion through blogspace , 2004, WWW '04.

[18]  David E. Culler,et al.  Hierarchical scheduling for diverse datacenter workloads , 2013, SoCC.

[19]  Pablo Rodriguez,et al.  I tube, you tube, everybody tubes: analyzing the world's largest user generated content video system , 2007, IMC '07.

[20]  Josep Paradells Aspas,et al.  Smart Cities as an Application of Internet of Things: Experiences and Lessons Learnt in Barcelona , 2013, 2013 Seventh International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing.

[21]  Lada A. Adamic,et al.  The role of social networks in information diffusion , 2012, WWW.

[22]  Mirjam Wattenhofer,et al.  YouTube around the world: geographic popularity of videos , 2012, WWW.

[23]  Andrew V. Goldberg,et al.  Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.

[24]  Austin Donnelly,et al.  Sierra: practical power-proportionality for data center storage , 2011, EuroSys '11.

[25]  Xiao Zhang,et al.  CPI2: CPU performance isolation for shared compute clusters , 2013, EuroSys '13.

[26]  Theo Kanter,et al.  Context-aware Group Management in Mobile Environments , 2005 .

[27]  Yogesh V. Joshi,et al.  New Product Diffusion with Influentials and Imitators , 2007 .

[28]  Dave Evans,et al.  How the Next Evolution of the Internet Is Changing Everything , 2011 .

[29]  David R. Cox,et al.  The Oxford Dictionary of Statistical Terms , 2006 .

[30]  Mohammad Shikh-Bahaei,et al.  Survey on peer-assisted content delivery networks , 2017, Comput. Networks.

[31]  Karthik Ranganathan,et al.  Apache hadoop goes realtime at Facebook , 2011, SIGMOD '11.

[32]  Anees Shaikh,et al.  Performance Isolation and Fairness for Multi-Tenant Cloud Storage , 2012, OSDI.

[33]  Karsten Schwan,et al.  Robust and flexible power-proportional storage , 2010, SoCC '10.

[34]  Lingjia Tang,et al.  Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[35]  R. S. H. Istepanian,et al.  The potential of Internet of m-health Things “m-IoT” for non-invasive glucose level sensing , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[36]  Sharon L. Milgram,et al.  The Small World Problem , 1967 .

[37]  Patrick Wendell,et al.  Sparrow: distributed, low latency scheduling , 2013, SOSP.

[38]  Yasushi Inoguchi,et al.  Performance evaluation of a Green Scheduling Algorithm for energy savings in Cloud computing , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).

[39]  R. Holley,et al.  Ergodic Theorems for Weakly Interacting Infinite Systems and the Voter Model , 1975 .

[40]  Arkady B. Zaslavsky,et al.  Context Aware Computing for The Internet of Things: A Survey , 2013, IEEE Communications Surveys & Tutorials.

[41]  Irene Kilanioti,et al.  Improving Multimedia Content Delivery via Augmentation With Social Information: The Social Prefetcher Approach , 2015, IEEE Transactions on Multimedia.

[42]  Vijay Erramilli,et al.  TailGate: handling long-tail content with a little help from friends , 2012, WWW.

[43]  Van Jacobson,et al.  Networking named content , 2009, CoNEXT '09.

[44]  Jacob Goldenberg,et al.  Talk of the Network: A Complex Systems Look at the Underlying Process of Word-of-Mouth , 2001 .

[45]  Abhishek Verma,et al.  Large-scale cluster management at Google with Borg , 2015, EuroSys.

[46]  Rebecca Montanari,et al.  Context-Aware Middleware for Anytime, Anywhere Social Networks , 2007, IEEE Intelligent Systems.

[47]  Indranil Gupta,et al.  Disk Layout Techniques for Online Social Network Data , 2012, IEEE Internet Computing.

[48]  Duncan J Watts,et al.  A simple model of global cascades on random networks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[49]  Srikanth Kandula,et al.  Efficient queue management for cluster scheduling , 2016, EuroSys.

[50]  William G. Griswold,et al.  Challenge: ubiquitous location-aware computing and the "place lab" initiative , 2003, WMASH '03.

[51]  Antony Tang,et al.  Adaptive Virtual Machine Migration Mechanism for Energy Efficiency , 2016, 2016 IEEE/ACM 5th International Workshop on Green and Sustainable Software (GREENS).

[52]  Carlo Curino,et al.  Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters , 2015, USENIX Annual Technical Conference.

[53]  Anne-Marie Kermarrec,et al.  Hawk: Hybrid Datacenter Scheduling , 2015, USENIX Annual Technical Conference.

[54]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[55]  Willy Zwaenepoel,et al.  Job-aware Scheduling in Eagle: Divide and Stick to Your Probes , 2016, SoCC.

[56]  Laks V. S. Lakshmanan,et al.  Learning influence probabilities in social networks , 2010, WSDM '10.

[57]  Duncan J. Watts,et al.  Everyone's an influencer: quantifying influence on twitter , 2011, WSDM '11.

[58]  E. Ising Beitrag zur Theorie des Ferromagnetismus , 1925 .

[59]  Kristina Lerman,et al.  Using proximity to predict activity in social networks , 2011, WWW.

[60]  Thanasis Korakis,et al.  Semantic coordination protocol for LTE and Wi-Fi coexistence , 2016, 2016 European Conference on Networks and Communications (EuCNC).

[61]  Yanpei Chen,et al.  Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads , 2012, Proc. VLDB Endow..

[62]  Lakshmish Ramaswamy,et al.  Towards efficient query processing on massive time-evolving graphs , 2012, 8th International Conference on Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom).

[63]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[64]  P. Lazarsfeld,et al.  Personal Influence: The Part Played by People in the Flow of Mass Communications , 1956 .

[65]  Jadwiga Indulska,et al.  Infrastructure for Pervasive Computing: Challenges , 2001, GI Jahrestagung.

[66]  Zhu Wang,et al.  QUANTITATIVE EVALUATION OF GROUP USER EXPERIENCE IN SMART SPACES , 2010, Cybern. Syst..

[67]  Tobias Hoßfeld,et al.  Internet Video Delivery in YouTube: From Traffic Measurements to Quality of Experience , 2013, Data Traffic Monitoring and Analysis.

[68]  Simin Nadjm-Tehrani,et al.  EnergyBox: A Trace-Driven Tool for Data Transmission Energy Consumption Studies , 2013, EE-LSDS.

[69]  Rajkumar Buyya,et al.  Cloud-Based Augmentation for Mobile Devices: Motivation, Taxonomies, and Open Challenges , 2013, IEEE Communications Surveys & Tutorials.

[70]  Cecilia Mascolo,et al.  Track globally, deliver locally: improving content delivery networks by tracking geographic social cascades , 2011, WWW.

[71]  Aman Kansal,et al.  Q-clouds: managing performance interference effects for QoS-aware clouds , 2010, EuroSys '10.

[72]  Bill N. Schilit,et al.  Disseminating active map information to mobile hosts , 1994, IEEE Network.

[73]  Jure Leskovec,et al.  Modeling Information Diffusion in Implicit Networks , 2010, 2010 IEEE International Conference on Data Mining.

[74]  S. Bikhchandani,et al.  You have printed the following article : A Theory of Fads , Fashion , Custom , and Cultural Change as Informational Cascades , 2007 .

[75]  F. Bass A new product growth model for consumer durables , 1976 .

[76]  W. Arthur,et al.  INCREASING RETURNS AND LOCK-IN BY HISTORICAL EVENTS , 1989 .

[77]  George Angelos Papadopoulos,et al.  Content delivery simulations supported by social network-awareness , 2017, Simul. Model. Pract. Theory.

[78]  Rajkumar Buyya,et al.  Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in Cloud data centers , 2012, Concurr. Comput. Pract. Exp..

[79]  Gerhard P. Hancke,et al.  A Survey on 5G Networks for the Internet of Things: Communication Technologies and Challenges , 2018, IEEE Access.

[80]  George Buchanan,et al.  Context-awareness in mobile tourist information systems: challenges for user interaction , 2005 .

[81]  Francesco Palmieri,et al.  Saving Energy in Data Center Infrastructures , 2011, 2011 First International Conference on Data Compression, Communications and Processing.

[82]  M. Kuperman,et al.  Small world effect in an epidemiological model. , 2000, Physical review letters.

[83]  Scott Shenker,et al.  Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.

[84]  Bernardo A. Huberman,et al.  The Pulse of News in Social Media: Forecasting Popularity , 2012, ICWSM.

[85]  Kyle Chard,et al.  Social Cloud: Cloud Computing in Social Networks , 2010, 2010 IEEE 3rd International Conference on Cloud Computing.

[86]  Magdalena Balazinska,et al.  Hadoop's Adolescence , 2013, Proc. VLDB Endow..

[87]  Aravind Menon,et al.  Big data @ facebook , 2012 .

[88]  Yun Chi,et al.  Information flow modeling based on diffusion rate for prediction and ranking , 2007, WWW '07.

[89]  Jon Crowcroft,et al.  Buzztraq: predicting geographical access patterns of social cascades using social networks , 2009, SNS '09.

[90]  Gregory D. Abowd,et al.  A Conceptual Framework and a Toolkit for Supporting the Rapid Prototyping of Context-Aware Applications , 2001, Hum. Comput. Interact..

[91]  Christina Delimitrou,et al.  Tarcil: reconciling scheduling speed and quality in large shared clusters , 2015, SoCC.

[92]  Jon Kleinberg,et al.  Maximizing the spread of influence through a social network , 2003, KDD '03.

[93]  Arkady B. Zaslavsky,et al.  Sensing as a service model for smart cities supported by Internet of Things , 2013, Trans. Emerg. Telecommun. Technol..

[94]  Asaf Shapira,et al.  A note on maximizing the spread of influence in social networks , 2011, Inf. Process. Lett..

[95]  Rajkumar Buyya,et al.  Energy Efficient Resource Management in Virtualized Cloud Data Centers , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.

[96]  Stephen P. Borgatti,et al.  Centrality and network flow , 2005, Soc. Networks.

[97]  T. Schelling Micromotives and Macrobehavior , 1978 .

[98]  Kristina Lerman,et al.  Characterising Emergent Semantics in Twitter Lists , 2012, ESWC.

[99]  Mirco Musolesi,et al.  Anticipatory Mobile Computing , 2013, ACM Comput. Surv..

[100]  Rabih Bashroush,et al.  A cost effective cloud data centre capacity planning method based on modality cost analysis , 2013, Int. J. Commun. Networks Distributed Syst..

[101]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[102]  Frank M. Bass,et al.  A New Product Growth for Model Consumer Durables , 2004, Manag. Sci..

[103]  Bernhard Schölkopf,et al.  Structure and dynamics of information pathways in online media , 2012, WSDM.

[104]  E. Rogers Diffusion of Innovations , 1962 .

[105]  Anja Feldmann,et al.  On dominant characteristics of residential broadband internet traffic , 2009, IMC '09.

[106]  Anind K. Dey,et al.  Understanding and Using Context , 2001, Personal and Ubiquitous Computing.

[107]  Hitesh Ballani,et al.  Decentralized task-aware scheduling for data center networks , 2015, SIGCOMM 2015.

[108]  Eli Berger Dynamic Monopolies of Constant Size , 2001, J. Comb. Theory, Ser. B.

[109]  P. Clifford,et al.  A model for spatial conflict , 1973 .

[110]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[111]  Nenad Milosevic,et al.  CoordSS: An Ontology Framework for Heterogeneous Networks Experimentation , 2016 .

[112]  Rabih Bashroush,et al.  Architectural Principles for Energy-Aware Internet-Scale Applications , 2017, IEEE Software.

[113]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[114]  Randy H. Katz,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[115]  Jorge L. V. Barbosa,et al.  A Context-Aware Spontaneous Mobile Social Network , 2015, 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom).

[116]  Srikanth Kandula,et al.  PACMan: Coordinated Memory Caching for Parallel Jobs , 2012, NSDI.

[117]  Jie Liu,et al.  Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines , 2011, SoCC.

[118]  Marco Mellia,et al.  Dissecting Video Server Selection Strategies in the YouTube CDN , 2011, 2011 31st International Conference on Distributed Computing Systems.

[119]  P. Bonacich Power and Centrality: A Family of Measures , 1987, American Journal of Sociology.

[120]  Ludovic Denoyer,et al.  Predicting information diffusion on social networks with partial knowledge , 2012, WWW.

[121]  Hui Wang,et al.  Superset: A Non-uniform Replica Placement Strategy towards High-Performance and Cost-Effective Distributed Storage Service , 2013, 2013 International Conference on Advanced Cloud and Big Data.

[122]  Robert N. M. Watson,et al.  Firmament: Fast, Centralized Cluster Scheduling at Scale , 2016, OSDI.

[123]  Hamid Sharif,et al.  A Survey on Smart Grid Communication Infrastructures: Motivations, Requirements and Challenges , 2013, IEEE Communications Surveys & Tutorials.

[124]  Jon M. Kleinberg,et al.  Tracing information flow on a global scale using Internet chain-letter data , 2008, Proceedings of the National Academy of Sciences.

[125]  Piet Van Mieghem,et al.  Are friends overrated? A study for the social news aggregator Digg.com , 2012, Comput. Commun..

[126]  Mahadev Satyanarayanan,et al.  Pervasive computing: vision and challenges , 2001, IEEE Wirel. Commun..

[127]  Lingjia Tang,et al.  Whare-map: heterogeneity in "homogeneous" warehouse-scale computers , 2013, ISCA.

[128]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[129]  Srikanth Kandula,et al.  Multi-resource packing for cluster schedulers , 2015, SIGCOMM.

[130]  Mark S. Granovetter Threshold Models of Collective Behavior , 1978, American Journal of Sociology.

[131]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[132]  Anne-Marie Kermarrec,et al.  Content and geographical locality in user-generated content sharing systems , 2012, NOSSDAV '12.

[133]  Emiliano Miluzzo,et al.  A survey of mobile phone sensing , 2010, IEEE Communications Magazine.

[134]  Michael Aizenman,et al.  Metastability effects in bootstrap percolation , 1988 .

[135]  Ravi Sundaram,et al.  WebCloud: Recruiting Social Network Users to Assist in Content Distribution , 2012, 2012 IEEE 11th International Symposium on Network Computing and Applications.

[136]  I. N. A. C. I. J. H. Fowler Book Review: Connected: The surprising power of our social networks and how they shape our lives. , 2009 .

[137]  Rajkumar Buyya,et al.  Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing , 2012, Future Gener. Comput. Syst..

[138]  J. Kleinberg Algorithmic Game Theory: Cascading Behavior in Networks: Algorithmic and Economic Issues , 2007 .

[139]  Yamir Moreno,et al.  Theory of Rumour Spreading in Complex Social Networks , 2007, ArXiv.

[140]  Danah Boyd,et al.  Social Network Sites: Definition, History, and Scholarship , 2007, J. Comput. Mediat. Commun..