A study of automation from seed URL generation to focused web archive development: the CTRnet context
暂无分享,去创建一个
In the event of emergencies and disasters, massive amounts of web resources are generated and shared. Due to the rapidly changing nature of those resources, it is important to start archiving them as soon as a disaster occurs. This led us to develop a prototype system for constructing archives with minimum human intervention using the seed URLs extracted from tweet collections. We present the details of our prototype system. We applied it to five tweet collections that had been developed in advance, for evaluation. We also identify five categories of non- relevant files and conclude with a discussion of findings from the evaluation.
[1] Leysia Palen,et al. Twitter adoption and use in mass convergence and emergency events , 2009 .
[2] Thomas Risse,et al. Turning Pure Web Page Storages into Living Web Archives , 2010 .
[3] Vasileios Kandylas,et al. The utility of tweeted URLs for web search , 2010, WWW '10.