Early measurements of a cluster-based architecture for P2P systems

Peer-to-peer applications such as Napster [4], Freenet [1], and Gnutella [2], [7] have gained much attention recently. These applications are mainly designed and used for largescale sharing of MP3 files. In such systems, end-hosts self-organize into an overlay network and share content with each other. Compared to the traditional client-server model, files are served in a distributed manner and replicated among the network on demand. Since hosts participating in peer-to-peer (P2P) networks also devote some computing resources, such systems scale with the number of hosts in terms of hardware, bandwidth, and disk space. With the wide deployment of P2P applications, the P2P traffic is becoming a growing portion of the Internet traffic. There has been very little examination of P2P traffic patterns and how they differ from traditional service models. Studying and understanding P2P traffic is thus important to provide efficient application-level content location and routing within the network. The existing applications use their own approach to do content location and routing and none of them are scalable. Napster uses a centralized server to locate content, while Gnutella clients broadcast queries to all their neighbors. [8] discusses the query locality observed in Gnutella traces and suggests caching as a short-term approach to increase Gnutella’s scalability. Recent designs such as CAN [5], Chord [9], Pastry [6], and Tapestry [10] propose distributed indexing schemes based on hashing to locate content. These systems assume a flat content delivery mesh. Each object’s location is stored at one or more nodes selected deterministically by a uniform hash function; queries for the object will be routed incrementally to the node. Although hash functions can help locate content deterministically, they lack the flexibility of keyword searching—a useful operation to find content without prior knowledge of exact object names. There is no real deployment at present and thus no measurement information is available for understanding the usability and scalability of