Selection and context scoping for digital video collections: an investigation of youtube and blogs

Digital curators are faced with decisions about what part of the ever-growing, ever-evolving space of digital information to collect and preserve. The recent explosion of web video on sites such as YouTube presents curators with an even greater challenge - how to sort through and filter a large amount of information to find, assess and ultimately preserve important, relevant, and interesting video. In this paper, we describe research conducted to help inform digital curation of on-line video. Since May 2007, we have been monitoring the results of 57 queries on YouTube related to the 2008 U.S. presidential election. We report results comparing these data to blogs that point to candidate videos on YouTube and discuss the effects of query-based harvesting as a collection development strategy.

[1]  Lada A. Adamic,et al.  The political blogosphere and the 2004 U.S. election: divided they blog , 2005, LinkKDD '05.

[2]  Gary Marchionini,et al.  VidArch: Preserving meaning of digital video over time through creating and capture of contextual documentation , 2006 .

[3]  Amanda Spink,et al.  A study of results overlap and uniqueness among major Web search engines , 2006, Inf. Process. Manag..

[4]  Polona Vilar,et al.  Archiving Websites: A Practical Guide for Information Management Professionals , 2008, J. Documentation.

[5]  Julien Masanés Archiving the Hidden Web , 2006 .

[6]  V. Gueorguieva,et al.  Voters, MySpace, and YouTube , 2008 .

[7]  Sriram Raghavan,et al.  Crawling the Hidden Web , 2001, VLDB.

[8]  Chirag Shah,et al.  Preserving 2008 US Presidential Election Videos , 2008 .

[9]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[10]  Donna Bergmark,et al.  Collection synthesis , 2002, JCDL '02.

[11]  G. Tomlinson,et al.  YouTube as a source of information on immunization: a content analysis. , 2007, JAMA.

[12]  Adrian Brown,et al.  Archiving Websites: A Practical Guide for Information Management Professionals , 2006 .

[13]  Daniel W. Drezner,et al.  The power and politics of blogs , 2007 .

[14]  Petros Zerfos,et al.  Downloading textual hidden web content through keyword queries , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[15]  Costas Panagopoulos Technology and the Transformation of Political Campaign Communications , 2007 .

[16]  V. Gueorguieva,et al.  Voters, MySpace, and YouTube: The Impact of Alternative Communication Channels on the Election Cycle and Beyond. , 2006 .