Supporting Research Data Collection from YouTube with TubeKit

ABSTRACT We present TubeKit, a query-based YouTube crawling toolkit. This software is a collection of tools that allows users to build their own crawler that can crawl YouTube based on a set of seed queries and collect up to 17 different attributes. TubeKit assists in all the phases of this process, starting with database creation to finally giving access to the collected data with browsing and searching interfaces. We further demonstrate how we used this toolkit to collect elections-related data from YouTube for nearly two years. Some analysis of the collected data relating to the elections is also given.