RSS-Crawler Enhancement for Blogosphere-Mapping

The massive adoption of social media has provided new ways for individuals to express their opinions online. The blogosphere, an inherent part of this trend, contains a vast array of information about a variety of topics. It is a huge think tank that creates an enormous and ever-changing archive of open source intelligence. Mining and modeling this vast pool of data to extract, exploit and describe meaningful knowledge in order to leverage structures and dynamics of emerging networks within the blogosphere is the higher-level aim of the research presented here. Our proprieteary development of a tailor-made feed-crawler-framework meets exactly this need. While the main concept, as well as the basic techniques and implementation details of the crawler have already been dealt with in earlier publications, this paper focuses on several recent optimization efforts made on the crawler framework that proved to be crucial for the performance of the overall framework.

[1]  Matthew Hurst,et al.  BlogPulse: Automated Trend Discovery for Weblogs , 2003 .

[2]  Jan-Hinrik Schmidt Weblogs : eine kommunikationssoziologische Studie , 2006 .

[3]  Nick Koudas,et al.  Searching the Blogosphere , 2007, WebDB.

[4]  Christoph Meinel,et al.  Mapping the Blogosphere with RSS-Feeds , 2010, 2010 24th IEEE International Conference on Advanced Information Networking and Applications.

[5]  Yun Chi,et al.  Identifying opinion leaders in the blogosphere , 2007, CIKM '07.

[6]  Jennifer Jie Xu,et al.  A Blog Mining Framework , 2009, IT Professional.

[7]  Lois Ann Scheidt,et al.  Bridging the gap: a genre analysis of Weblogs , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[8]  Guy L. Steele,et al.  The Java Language Specification , 1996 .

[9]  Axel Bruns,et al.  Methodologies for mapping the political blogosphere: An exploration using the IssueCrawler research tool , 2007, First Monday.

[10]  Philip S. Yu,et al.  Identifying the influential bloggers in a community , 2008, WSDM '08.

[11]  Martin Oberhofer,et al.  Knowledge Discovery in the Blogosphere: Approaches and Challenges , 2010, IEEE Internet Computing.