Real-Time Data Harvesting Method for Czech Twitter

This paper deals with automatic analysis of Czech social media. The main goal is to propose an approach to harvest interesting messages from Twitter in Czech language with high download speed. This method uses user lists to discover potentially interesting tweets to download. It is motivated by the fact that only about 20% of Twitter users are posting informative messages, whereas the remaining 80% not and that it is possible to identify the “important” users by the user lists. The experimental results show that the proposed method is very efficient because it harvests about 6 times more data than the other approaches. This approach should be integrated into an experimental system for the Czech News Agency to monitor the current data-flow on Twitter, download messages in real-time, analyze them and extract relevant events.