Parallel Crawling and Capturing for On-Line Auction

The on-line auction is one of the most successful types of electronic marketplaces and has been the subject of many academic studies. In recent years, empirical research on on-line auction has been flourishing because of the availability of large amounts of high-quality bid data from on-line auction sites. Researcher can provide buyers and sellers with useful information, and help prevent fraud and reduce purchase cost. However, the increasingly large volumes of bid data have made data collection increasingly complex and time consuming, and there are no effective resources that can adequately support this work. So this study focuses on the parallel crawling and capturing of on-line auctions from the automatic agent perspective to help researchers collect auction data more effectively. The issues in this study include parallel crawling architecture, crawling strategies, content capturing strategies, and prototype system implementation. Finally we conduct an empirical experiment on eBay U.S. and Ruten Taiwan to evaluate the performance of our parallel crawling and capturing. The results of this study show that our parallel crawling and capturing methodology is able to work in the real world and generate good performance outcomes.