Spektrum Organ Der Gesellschaft Für Informatik E.v. Und Mit Ihr Assoziierter Organisationen 75 Organic Design of Massively Distributed Systems: a Complex Networks Perspective 87 Cautionary Tales: Lessons Learned about Unanticipated Behaviors in Oc Systems 93 Selbstorganisierende Smart-kamera-systeme

The vision of Organic Computing addresses challenges that arise in the design of future information systems that are comprised of numerous, heterogeneous, resource-constrained and error-prone components. The notion organic highlights the idea that, in order to be manageable, such systems should exhibit selforganization, self-adaptation and self-healing characteristics similar to those of biological systems. In recent years, the principles underlying these characteristics are increasingly being investigated from the perspective of complex systems science, particularly using the conceptual framework of statistical physics and statistical mechanics. In this article, we review some of the interesting relations between statistical physics and networked systems and discuss applications in the engineering of organic overlay networks with predictable macroscopic properties. on the emergence of complex structures and collective dynamics in networks – an area that has been particularly active and successful during the last decade. In Sect. “Overlays, Random Graphs, and Complex Networks”, we summarize structured and unstructured approaches to the management of overlay networks. Here we additionally review the relevance of random graph theory for the design of unstructured systems and introduce the relations between the study of statistical mechanics and complex networks as well as the modeling of dynamical processes. In Sect. “Managing Organic Overlays – A Thermodynamic Perspective”, we discuss abstractions from statistical mechanics and statistical physics in the design of organic overlay networks. In Sect. “Conclusion and Outlook”, we summarize challenges and opportunities of using complex systems science in the engineering of distributed systems with predictable and controllable self-* properties. Overlays, Random Graphs, and Complex Networks Overlay networks – which define virtual connections on top of physical communication infrastructures – are becoming an increasingly important issue. As argued in [45], the possibility to define communication topologies and protocols at the application layer without having to make a – potentially globally – coordinated change of existing protocols, standards, and communication infrastructures is an important factor for a quick proliferation of novel services on the Internet as well as in large-scale data centers. The research of overlay topologies as well as efficient distributed algorithms providing core functionality like search, routing, and content dissemination has received a lot of attention recently. Most of this research has been done in the context of P2P systems, which are now increasingly used for the cost-efficient distribution of data for example by means of the BitTorrent protocol, the provision of video-telephony services like Skype, or even to face challenges emerging in large-scale scientific setups like the Large Hadron Collider[43]. One usually distinguishes structured and unstructured approaches in the management of overlay topologies. Most of the currently deployed systems belong to the former category. In such structured systems, virtual connections between machines are created in a globally consistent way to construct a particular network topology. While this allows for the development of highly efficient algorithms for distributed search, routing, or information dissemination, the major difficulty is to maintain this fine-tuned topology under dynamic conditions. Reconsidering the scenario outlined in Sect. “Introduction”, maintaining fine-tuned structures will entail massive complexities due to the excessive fluctuation of participating devices and the associated concurrency. In fact, for the distributed hash table Chord it has been argued in [7] that in settings with very large numbers of highly dynamic participants, the communication overhead imposed by mere topology maintenance and management schemes could exceed the cost for actual data transfer operations and thus dominates performance. It has further been argued that designing, implementing, and debugging topology maintenance schemes pose a huge challenge due to the massive concurrency that is introduced by failing or joining machines. These problems of structured overlays are well known in the literature and question their usability in future scenarios like the one laid out in Sect. “Introduction”. Hence, alternative approaches for dealing with large and dynamic settings are being studied. 76 Informatik_Spektrum_35_2_2012 Unstructured Topologies and Random Graph Theory A straight-forward idea is to use unstructured overlays in which virtual connections between machines are created in a simple, uncoordinated fashion while still allowing all machines to communicate with each other. While this reduces the overhead of topology management, it necessitates probabilistic algorithms for example for distributed search or routing that make no – or at least less specific – assumptions about the structure of the network or the placement of data items. Such schemes are inevitably less efficient compared to those tailored for a particular network structure. Nevertheless, they are significantly simpler to implement and allow for larger degrees of freedom in terms of adapting the network structure to operational conditions. In terms of modeling performance and robustness, most unstructured approaches to the management of overlays rely – either explicitly or implicitly – on results from the field of random graph theory which was established more than 50 years ago [20]. In order to explain the analogies between large, dynamic networked systems and statistical mechanics, we briefly recall one of the basic models of random graph theory. The so-called G(n, p) model defines a probability space that contains all possible graphs or networks1 G with n nodes. Assuming that edges between pairs of nodes are being generated by a stochastic process with uniform probability p, the G(n, p) model assigns each network G with n nodes and m edges the same probability to be created: PG(n, p)= p m · 1 – pn(n–1)/2–m This simple stochastic model for networks has been used in the modeling of a variety of real-world networks. In particular, one can use it to make predictions about the properties of unstructured overlays, if virtual connections are assumed to be created at random with probability p or, alternatively, if an average number of p · n(n – 1)/2 connections are established between randomly chosen pairs of nodes. In general, in the study of random networks one is particularly interested in properties that hold for a subset of network realizations whose probability 1 Throughout this article, we will use the terms graph and network interchangeably. measure converges to 1 as the size of the generated networks (in terms of the number of nodes) increases. In this case one can say that a property holds asymptotically almost surely for a randomly generated network. This is because the probability to draw a network that does not exhibit the property in question quickly vanishes. An authoritative overview of the interesting results derived from this perspective can be found in [14]. Two well-known examples of particular relevance for the design of overlay networks are results on the critical percolation threshold and the diameter. The critical percolation threshold refers to a point in the G(n, p) model’s parameter p above which the generated networks almost surely contain a connected component that is of the order of the network size. For the G(n, p) model it has been found that connected components of a random graph are with high probability of the order log(n) if p < 1/n. For p > 1/n the connected component is of the order n [20]2. In practical terms, this result is a crucial prerequisite for the feasibility of unstructured overlay management schemes since it tells that – if at least a certain minimum number of connections is created in a random and uncoordinated fashion – all machines will be able to communicate with each other with a high probability. Another set of results that are important for overlays with random structures relates the parameter p to the diameter of the resulting topology. It further gives a criterion for the emergence of socalled small-world topologies which are assumed to have a diameter of the order of the logarithm of the network size. For the G(n, p) model, it has been shown that the diameter is with high probability of order log(n)/log(np), if the average number of links per node is at least 1. In the design of unstructured topologies, this argument is crucial to reason about the efficiency of search and routing schemes. Statistical Mechanics of Complex Networks As argued in [15], the existence of so-called critical points in the G(n, p) model’s parameter p and the associated sudden change of macroscopic qualities like diameter or connectedness, highlights interesting relations to phase transition phenomena in statistical physics, i. e., sudden changes of material properties as aggregate control parameters (e. g., 2 Interestingly this is a so-called double-jump transition, i. e., for p= 1/n the size of the connected component is of the order n 2 3 . Informatik_Spektrum_35_2_2012 77 { MASSIVELY DISTRIBUTED SYSTEMS temperature or pressure) change slightly. In recent years, these analogies to fundamental natural phenomena have been substantially deepened by reframing the study of random graph structures in terms of statistical mechanics and statistical physics (see e. g., [2, 11, 19, 21, 23, 37]). This perspective is possible since statistical mechanics reasons about configurations of many-particle systems, just like random graph theory reasons about network configurations. Each of these particle configurations – the so-called microstate – fixes the exact positions and energy states of all particles present in a given volume of space at a given temperature and total energy. On the basis of energy distributions, p

[1]  Paul Erdös,et al.  On random graphs, I , 1959 .

[2]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[3]  Joel E. Cohen,et al.  Threshold phenomena in random structures , 1988, Discret. Appl. Math..

[4]  Doug Terry,et al.  Epidemic algorithms for replicated database maintenance , 1988, OPSR.

[5]  Van Jacobson,et al.  The synchronization of periodic routing messages , 1994, TNET.

[6]  Bruce A. Reed,et al.  A Critical Point for Random Graphs with a Given Degree Sequence , 1995, Random Struct. Algorithms.

[7]  Reka Albert,et al.  Mean-field theory for scale-free random networks , 1999 .

[8]  James H. Aylor,et al.  Computer for the 21st Century , 1999, Computer.

[9]  Cohen,et al.  Resilience of the internet to random breakdowns , 2000, Physical review letters.

[10]  S. Havlin,et al.  Breakdown of the internet under intentional attack. , 2000, Physical review letters.

[11]  Lada A. Adamic,et al.  Search in Power-Law Networks , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[12]  Albert-László Barabási,et al.  Statistical mechanics of complex networks , 2001, ArXiv.

[13]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Johannes Berg,et al.  Correlated random networks. , 2002, Physical review letters.

[15]  Mauricio Barahona,et al.  Synchronization in small-world systems. , 2002, Physical review letters.

[16]  Sergey N. Dorogovtsev,et al.  Principles of statistical mechanics of random networks , 2002, ArXiv.

[17]  David R. Karger,et al.  Looking up data in P2P systems , 2003, CACM.

[18]  Mark E. J. Newman,et al.  The Structure and Function of Complex Networks , 2003, SIAM Rev..

[19]  Alessandro Vespignani,et al.  Efficiency and reliability of epidemic data dissemination in complex networks. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[20]  I. Farkas,et al.  Equilibrium statistical mechanics of network structures , 2004 .

[21]  Vwani P. Roychowdhury,et al.  Percolation search in power law networks: making unstructured peer-to-peer networks scalable , 2004 .

[22]  B. Kahng,et al.  Evolution of scale-free random graphs: Potts model formulation , 2004 .

[23]  Heiko Rieger,et al.  Random walks on complex networks. , 2004, Physical review letters.

[24]  I-Jeng Wang,et al.  Decentralized synchronization protocols with nearest neighbor communication , 2004, SenSys '04.

[25]  M. Newman,et al.  Statistical mechanics of networks. , 2004, Physical review. E, Statistical, nonlinear, and soft matter physics.

[26]  Márk Jelasity,et al.  Gossip-based aggregation in large dynamic networks , 2005, TOCS.

[27]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[28]  Anne-Marie Kermarrec,et al.  Efficient and adaptive epidemic-style protocols for reliable and scalable multicast , 2006, IEEE Transactions on Parallel and Distributed Systems.

[29]  J. Reichardt,et al.  Statistical mechanics of community detection. , 2006, Physical review. E, Statistical, nonlinear, and soft matter physics.

[30]  Oskar Sandberg,et al.  Distributed Routing in Small-World Networks , 2006, ALENEX.

[31]  J. Kleinberg Complex networks and decentralized search algorithms , 2006 .

[32]  V. Latora,et al.  Complex networks: Structure and dynamics , 2006 .

[33]  Soundar Kumara,et al.  Search in spatial scale-free networks , 2007 .

[34]  Márk Jelasity,et al.  Firefly-inspired Heartbeat Synchronization in Overlay Networks , 2007, First International Conference on Self-Adaptive and Self-Organizing Systems (SASO 2007).

[35]  Marián Boguñá,et al.  Navigability of Complex Networks , 2007, ArXiv.

[36]  J. A. Almendral,et al.  Dynamical and spectral properties of complex networks , 2007, 0705.3216.

[37]  Alessandro Vespignani,et al.  Dynamical Processes on Complex Networks , 2008 .

[38]  S. Kolos,et al.  The ATLAS Event Monitoring Service—Peer-to-Peer Data Distribution in High-Energy Physics , 2008, IEEE Transactions on Nuclear Science.

[39]  Ming Zhong,et al.  The Convergence-Guaranteed Random Walk and Its Applications in Peer-to-Peer Networks , 2008, IEEE Transactions on Computers.

[40]  Jurgen Kurths,et al.  Synchronization in complex networks , 2008, 0805.2976.

[41]  Tom M Mitchell,et al.  Mining Our Reality , 2009, Science.

[42]  Chrysanthos Dellarocas,et al.  Harnessing Crowds: Mapping the Genome of Collective Intelligence , 2009 .

[43]  Dmitri V. Krioukov,et al.  Greedy Forwarding in Dynamic Scale-Free Networks Embedded in Hyperbolic Metric Spaces , 2008, 2010 Proceedings IEEE INFOCOM.

[44]  Dominik Benz,et al.  Community Assessment Using Evidence Networks , 2010, MSM/MUSE.

[45]  Martina Zitterbart,et al.  Overlay-Netze als Innovationsmotor im Internet , 2010, Informatik-Spektrum.

[46]  Friedemann Mattern,et al.  Vom Internet der Computer zum Internet der Dinge , 2010, Informatik-Spektrum.

[47]  Ingo Scholtes,et al.  Epidemic Self-Synchronization in Complex Networks of Kuramoto oscillators , 2010, Adv. Complex Syst..

[48]  Roberto Baldoni,et al.  Coupling-Based Internal Clock Synchronization for Large-Scale Dynamic Distributed Systems , 2010, IEEE Transactions on Parallel and Distributed Systems.

[49]  Andreas Hotho,et al.  Datenschutz im Web 2.0 am Beispiel des sozialen Tagging-Systems BibSonomy , 2010, Informatik-Spektrum.

[50]  Ingo Scholtes,et al.  Distributed Creation and Adaptation of Random Scale-Free Overlay Networks , 2010, 2010 Fourth IEEE International Conference on Self-Adaptive and Self-Organizing Systems.

[51]  A. Kaplan,et al.  Users of the world, unite! The challenges and opportunities of Social Media , 2010 .

[52]  Stefan Rüping,et al.  Privacy-Preserving Data-Mining , 2010, Informatik-Spektrum.

[53]  Tom A. B. Snijders,et al.  Social Network Analysis , 2011, International Encyclopedia of Statistical Science.

[54]  Dominik Benz,et al.  Enhancing Social Interactions at Conferences , 2011, it Inf. Technol..

[55]  Martin Atzmüller,et al.  Efficient Descriptive Community Mining , 2011, FLAIRS.

[56]  Andreas Hotho,et al.  Face-to-Face Contacts during a Conference: Communities, Roles, and Key Players , 2011 .

[57]  Walter Willinger,et al.  Mathematics and the Internet: A Source of Enormous Confusion and Great Potential , 2009, The Best Writing on Mathematics 2010.

[58]  Ingo Scholtes Harnessing Complex Structures and Collective Dynamics in Large Networked Computing Systems , 2011 .

[59]  A. Barrat,et al.  Simulation of an SEIR infectious disease model on the dynamic contact network of conference attendees , 2011, BMC medicine.