The Importance of Aggregation

In this paper, we define aggregation as the ability to summarize information. In the area of sensor networks [16.2] it is also referred to as data fusion. It is the basis for scalability for many, if not all, large networking services. For example, address aggregation allows Internet routing to scale. Without it, routing tables would need a separate entry for each Internet address. Besides a problem of memory size, populating the tables would be all but impossible. DNS also makes extensive use of aggregation, allowing domain name to attribute mappings to be resolved in a small number of steps. Many basic distributed paradigms and consistency mechanisms are based on aggregation. For example, synchronization based on voting requires votes to be counted. Aggregation is a standard service in databases. Using SQL queries, users can explicitly aggregate data in one or more tables in a variety of ways. With so many examples of aggregation in networked systems, it is surprising that no standard exists there as well. On the contrary, each networked service uses implicitly built-in mechanisms for doing aggregation. This results in a number of problems. First, these mechanisms often require a fair amount of configuration, which is not shared and needs to be done for each service separately. Second, the configuration is often quite static, and does not adapt well to dynamic growth or failures that occur in the network. Finally, the implementations are complex but the code cannot be reused. There are only few general services available for aggregation in networked systems. “Mr. Fusion” [16.3] is a recent aggregation service intended for use with CORBA. Based on a voting framework [16.1], the Fusion Core collects ballots that are summarized when enough ballots have been collected. The output of the aggregation can be multidimensional, and represented as a hierarchical data cube. Cougar [16.4] is a sensor database system that supports SQL aggregation queries over the attributes of distributed sensors. Some other sensors network systems, like Directed Diffusion [16.5] have limited support for data aggregation as well. We have developed an aggregation facility as well, called Astrolabe [16.6], for use by networked services. It resembles DNS in that it organizes hosts in a domain hierarchy and associates attributes with each domain. Different from DNS, the attributes of a non-leaf domain are generated by SQL aggregation queries over its child domains, and new attributes are easily introduced. Astrolabe can be customized for new applications by specifying additional aggregation queries. The implementation of Astrolabe is peer-to-peer, and does not involve any servers. To date, we have used Astrolabe to ∗ This research was funded in part by DARPA/AFRL-IFGA grant F30602-99-1-0532 and in part by the AFRL/Cornell Information Assurance Institute.