Literary Exploration Machine A Web-Based Application for Textual Scholars

This paper presents a design of a web-based application for textual scholars. The goal of this project is to create a complex and stable research environment allowing scholars to upload the texts they analyse and either explore them with a suite of dedicated tools or transform them into a different format (e.g. text, table, list, spreadsheet). The latter functionality is especially important for research focusing on Polish texts (due to the rich morphology and weakly constrained word order of Polish) because it allows for their further processing with tools built for English. This project utilises the existing CLARIN-PL applications and supplements them with new functionalities.

[1]  Adam Radziszewski A Tiered CRF Tagger for Polish , 2013, Intelligent Tools for Building a Scientific Information Platform.

[2]  Stan Szpakowicz,et al.  The chicken-and-egg problem in wordnet design: synonymy, synsets and constitutive relations , 2013, Lang. Resour. Evaluation.

[3]  Maciej Janicki,et al.  Liner2 - A Customizable Framework for Proper Names Recognition for Polish , 2013, Intelligent Tools for Building a Scientific Information Platform.

[4]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[5]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[6]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[7]  A. Miranda-García,et al.  Stylometry and Authorship Attribution: Introduction to the Special Issue , 2012 .

[8]  Efstathios Stamatatos,et al.  A survey of modern authorship attribution methods , 2009, J. Assoc. Inf. Sci. Technol..

[10]  Marcin Wolinski,et al.  Morfeusz Reloaded , 2014, LREC.

[11]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[12]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[13]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[14]  Michal Marcinczuk Lemmatization of Multi-word Common Noun Phrases and Named Entities in Polish , 2017, RANLP.

[15]  Maciej Eder,et al.  Introduction to Stylomatic Analysis using R , 2012, DH.

[16]  Maciej Piasecki,et al.  Word Sense Disambiguation Based on Large Scale Polish CLARIN Heterogeneous Lexical Resources , 2015 .

[17]  Krzysztof Tomanek,et al.  Odkrywanie wiedzy w wypowiedziach tekstowych. Metoda budowy słownika klasyfikacyjnego , 2014 .

[18]  Tomasz Walkowiak,et al.  An open stylometric system based on multilevel text analysis , 2017 .

[19]  Stan Szpakowicz,et al.  plWordNet 3.0 – a Comprehensive Lexical-Semantic Resource , 2016, COLING.

[20]  Maciej Piasecki,et al.  A Wordnet from the ground up , 2009 .

[21]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[22]  Tomasz Walkowiak,et al.  WebSty - an Open Web-based System for Exploring Stylometric Structures in Document Collections , 2016, DH.

[23]  Piotr Siuda,et al.  Big Data i CAQDAS a procedury badawcze w polu socjologii jakościowej , 2017 .

[24]  Bartosz Broda,et al.  KPWr: Towards a Free Corpus of Polish , 2012, LREC.