A Resource-Efficient Method for Crawling Swarm Information in Multiple BitTorrent Networks

Bit Torrent is one of the most popular P2P file sharing applications in the world. Each Bit Torrent network is called a swarm and millions of peers may join multiple swarms. Due to swarm's large network size and complexity, many resources (PC servers, the Internet connection, etc.) are required for measuring all the swarms in the world. For this reason, the existing work is forced to measure only a part of the entire set of swarms, thus, ends up understanding only a part of it. In this paper, we propose a resource-efficient method for crawling multiple Bit Torrent swarms by only a limited amount of resources such as a single PC server. In the proposed method, our crawler avoids collecting redundant information of swarms without pressing WAN access links nor expending much processing resources. We also use a number of techniques to efficiently crawl all the participating peers of multiple swarms. We crawl over 4.3 million unique .torrent files, small files that store metadata used in Bit Torrent, and 48,000 tracker addresses. We can crawl 4.3 million swarms within an hour. We obtain 24 swarm snapshots and 10 million unique peers in a day.