Improving the robustness, manageability, and performance of internet-scale applications via query and actuation
暂无分享,去创建一个
Over the past few years, researchers have devoted significant attention to developing application-internal mechanisms for providing robustness, self-management, and scalable performance in Internet-scale applications such as content distribution networks [63, 128, 48, 30], peer-to-peer storage [71, 38, 84, 110, 64] and filesharing [75, 65], distributed games [17], and scientific Grid applications [45, 89, 139, 3, 59, 124, 40]. But there has been little focus on developer and operator tools, evaluation techniques, and services designed to improve these application properties. Such tools and services, because they run outside of any one application, offer the potential to improve these qualities across many applications. This thesis investigates how to build tools and services to fill this gap.
In particular, we hypothesize that key building blocks for such tools and services are distributed querying and actuation. This dissertation therefore addresses the following question: How can distributed querying and actuation be used to enhance the robustness, manageability, and performance of Internet-scale applications?
We showcase these techniques in the Application Control and Monitoring Environment (ACME) tool for robustness benchmarking and application management, and the Scalable Wide-Area Resource Discovery (SWORD) service for resource discovery and service placement. ACME uses distributed queries over “sensors” to collect data about an application's performance and internal state, and it uses “actuators” to automatically invoke user-specified actions, that are tied to conditions over the sensor data, to implement a robustness benchmarking scenario or management policy. SWORD uses distributed queries to collect information about free node and network resources; in combination with a user-supplied specification of their application's resource needs and desires, it provides advice on where to place instances of that application. Once the application has been deployed, actuation takes the form of informing the application when node or network conditions have changed such that migration is necessary to continue complying with the application's resource needs, and/or to optimize the desirability of the deployment configuration.
ACME improves application scalability, robustness, and performance by facilitating robustness benchmarking before deployment, and it improves manageability by allowing operators to define actions to be taken automatically in response to anomalous conditions detected in a deployed application. SWORD improves performance by placing application instances on nodes with sufficient resources, and it improves robustness to performance variability by advising applications when and how to migrate. Moreover, both ACME and SWORD themselves use techniques that allow them to operate successfully in large-scale Internet environments that provide their own scalability, robustness, and performance challenges. In addition to describing the design, implementation, and performance of ACME and SWORD, we demonstrate the usefulness of these systems for robustness benchmarking and intelligent service placement for real applications, including lessons learned from operating SWORD continuously on PlanetLab for more than six months.