Archiving and Analyzing Tweets and Webpages with the DLRL Hadoop Cluster

Sunshin Lee Dept. of Computer Science, Virginia Tech Blacksburg, VA 24061 USA sslee777@vt.edu Edward A. Fox Dept. of Computer Science, Virginia Tech Blacksburg, VA 24061 USA fox@vt.edu ABSTRACT In the Integrated Digital Event Archive and Library (IDEAL) [1] project we research the next generation integration of digital libraries and event archiving. The project team has been collecting Internet information such as tweets and webpages related to crises or tragedies in addition to recovery and government/community events. This poster is about the Hadoop cluster in the Digital Library Research Laboratory (DLRL) of the Department of Computer Science, Virginia Tech, along with its use in archiving and analyzing tweets and webpages.