YouTube Data Collection Using Parallel Processing

Several studies have identified social media platforms as significant data sources to study human behaviors and gain situational awareness about various events or crises. YouTube, being one of the largest social media platforms, provides a Data API that enables data collection on YouTube channels and videos which can be used in these studies. Current sequential methods for processing YouTube Data API requests are time consuming. In this paper we developed an implementation that utilizes Python’s multiprocessing to process YouTube Data API request in parallel. Our tests indicate multiprocessing improves the performance by 400%. These improvements reduce computation time through utilization of multi-threaded CPU architecture.