Parallel Processing Videos in Very Large Digital Libraries

Nowhere are the ‘growing pains’ of Very Large-scale Digital Libraries more pronounced than in collections containing multimedia data. Not only do such collections contain large numbers of items, but they also push the boundaries of scale in terms of storage space and processing expense. In this paper we explore how applying parallel processing open-source libraries and techniques—previously developed for and applied to textual content—can be of benefit to multimedia digital libraries. We provide a real-world use case of ingesting video into the ReplayMe! system, an extension of the Greenstone digital library software, that simultaneously records and ingests all of the free-to-air television channels available in New Zealand. Current ingest of video in ReplayMe! is intentionally light due to processing time constraints on the single processor architecture it was developed on. The work reported here investigates how this system can be scaled up to include the conversion of the broadcast video transport format to a suitable a streaming format (MP4) and to automatically extract content analysis based keyframes, while still performing within real-time. By applying parallel processing, and utilizing a cluster of sixteen desktop computers, the paper shows how this processing time can be significantly reduced compared to the equivalent computation if conducted serially. We then generalize the work, and show how the same basic techniques can be applied to other common digital library software such as DSpace to provide similar advantages when dealing with processor intensive content.

[1]  Takeo Kanade,et al.  Informedia Digital Video Library , 1995, CACM.

[2]  Vamshi Ambati,et al.  Multimedia Digital Library: Performance and Scalability Issues , 2008 .

[3]  Andrei Bursuc,et al.  OVIDIUS: A Web Platform for Video Browsing and Search , 2012, MMM.

[4]  John F. Elder The Million Book Digital Library Project: Research Problems in Data Mining and Discovery , 2005 .

[5]  Heather Christenson Hathitrust: A research library at web Scale , 2011 .

[6]  David Bainbridge,et al.  Live television in a digital library , 2012, JCDL '12.

[7]  Marcus Jerome Pickering,et al.  Video Retrieval Using Search and Browsing with Key Frames , 2003, TRECVID.

[8]  MacKenzie Smith,et al.  DSpace: An Open Source Dynamic Digital Repository , 2003, D Lib Mag..

[9]  Jack Dongarra,et al.  Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, Dublin, Ireland, September 7-10, 2008. Proceedings , 2008, PVM/MPI.

[10]  Matthieu Guillaumin,et al.  Combining Image-Level and Segment-Level Models for Automatic Annotation , 2012, MMM.

[11]  Chen Ling,et al.  Building the New-generation China Academic Digital Library Information System (CADLIS): A Review and Prospectus , 2010, D Lib Mag..

[12]  John Thompson,et al.  Towards Very Large Scale Digital Library Building in Greenstone Using Parallel Processing , 2011, ICADL.

[13]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[14]  Gary Marchionini,et al.  The Open Video Digital Library , 2002, D Lib Mag..