Enhancing Access to Media Collections and Archives Using Computational Linguistic Tools

In this paper, we outline the strategies, methodology, and infrastructure needed to bring advanced computational linguistic tools to researchers and archivists in the humanities. We discuss three use cases involving the application of the Language Application Grid (LAPPS), an open, web-based infrastructure providing interoperable access to hundreds of computational linguistic (CL) component web services, together with facilities for multistep analyses via tools pipelining, performance evaluation, and resource delivery. These include: CL analysis of corpora restricted under copyright; the challenge posed by radio and television media collections; and the use of LAPPS for assisting archivists in their collection and cataloguing efforts. We believe that the adoption and use of CL platforms such as LAPPS by the digital humanities (DH) will help foster better communication, sharing, and research between the two communities.

[1]  R. Acosta,et al.  Strategy for reflector pattern calculation: Let the computer do the work , 1985, 1986 Antennas and Propagation Society International Symposium.

[2]  Inna Kouper,et al.  HathiTrust research center: computational access for digital humanities and beyond , 2013, JCDL '13.

[3]  Nancy Ide,et al.  Using the Right Tools: Enhancing Retrieval from Marked-up Documents , 1999, Comput. Humanit..

[4]  James Pustejovsky,et al.  The Language Application Grid , 2014, WLSI.

[5]  Graeme Hirst,et al.  GutenTag: an NLP-driven Tool for Digital Humanities Research in the Project Gutenberg Corpus , 2015, CLfL@NAACL-HLT.

[6]  Peter Leonard Mining large datasets for the humanities , 2014 .

[7]  James Pustejovsky,et al.  The Language Application Grid Web Service Exchange Vocabulary , 2014, WLSI.

[8]  Atul Prakash,et al.  Cloud computing data capsules for non-consumptiveuse of texts , 2014, ScienceCloud '14.

[9]  J. Stephen Downie,et al.  Improving Access to Large-scale Digital Libraries ThroughSemantic-enhanced Search and Disambiguation , 2015, JCDL.

[10]  Michiel Hildebrand,et al.  Waisda?: making videos findable through crowdsourced annotations , 2014 .

[11]  John Unsworth,et al.  A Companion to Digital Humanities , 2008 .

[12]  L. Borin Semantic resources and semantic annotation for Natural Language Processing and the Digital Humanities , 2015 .

[13]  Howard Besser The Next Stage: Moving from Isolated Digital Collections to Interoperable Digital Libraries , 2002, First Monday.

[14]  James Pustejovsky,et al.  The Language Application Grid and Galaxy , 2016, LREC.

[15]  J. Stephen Downie,et al.  The HathiTrust Corpus: A Digital Library for Musicology Research? , 2014, DLfM '14.

[16]  Patrik Svensson,et al.  The Landscape of Digital Humanities , 2010, Digit. Humanit. Q..

[17]  Edward Vanhoutte,et al.  The Gates of Hell: History and Definition of Digital | Humanities | Computing , 2016 .

[18]  Daniel J. Blankenberg,et al.  Galaxy: a platform for interactive large-scale genome analysis. , 2005, Genome research.

[19]  James Pustejovsky,et al.  The LAPPS Interchange Format , 2015, WLSI.

[20]  Susan Hockey,et al.  The History of Humanities Computing , 2007 .

[21]  Dominique Estival,et al.  The Alveo Virtual Laboratory: A Web Based Repository API , 2014, LREC.