Traveling through Space and Time, or: Making Historical Travelogues Accessible
暂无分享,去创建一个
Investigating perceptions of Otherness is the overarching goal of the Travelogues project. It studies a corpus comprising of thousands of recently digitized travelogues dating back to the 16th century held by the Austrian National Library. Driven by an interdisciplinary team of historians and data scientists, it aims at making knowledge that is now hidden in a huge text corpus accessible to researchers. In the current, initial project phase, we explore how statistical methods, such as word embeddings, can be used to assess the structure and semantics of large text corpora in order to make those resources accessible. We developed an initial methodology that combines visual and statistical cues for identifying possible starting points for a more fine-grained text corpus exploration. Ultimately, this data-driven approach is expected to result in new and possibly unexpected insights stemming from resources that were previously de-facto inaccessible.
[1] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.
[2] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.
[3] Petr Sojka,et al. Software Framework for Topic Modelling with Large Corpora , 2010 .
[4] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .