Computational Linguistics for Mere Mortals - Powerful but Easy-to-use Linguistic Processing for Scientists in the Humanities

Delivering linguistic resources and easy-to-use methods to a broad public in the humanities is a challenging task. On the one hand users rightly demand easy to use interfaces but on the other hand want to have access to the full flexibility and power of the functions being offered. Even though a growing number of excellent systems exist which offer convenient means to use linguistic resources and methods, they usually focus on a specific domain, as for example corpus exploration or text categorization. Architectures which address a broad scope of applications are still rare. This article introduces the eHumanities Desktop, an online system for corpus management, processing and analysis which aims at bridging the gap between powerful command line tools and intuitive user interfaces.

[1]  Mike Dowman,et al.  Semantically Enhanced Television News through Web and Video Integration , 2005 .

[2]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[3]  Alexander Mehler,et al.  Enhancing document modeling by means of open topic models: Crossing the frontier of classification schemes in digital libraries by example of the DDC , 2009, Libr. Hi Tech.

[4]  Thorsten Vitt,et al.  TextGrid , 2009, Künstliche Intell..

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  Kalina Bontcheva,et al.  Adapting SVM for data sparseness and imbalance: a case study in information extraction , 2009, Natural Language Engineering.

[7]  Gerhard Heyer,et al.  Towards Automatic Content Tagging - Enhanced Web Services in Digital Libraries using Lexical Chaining , 2008, WEBIST.

[8]  Kathy Sierra,et al.  Head First Design Patterns , 2004 .

[9]  Claudia Kunze,et al.  GermaNet - representation, visualization, application , 2002, LREC.

[10]  Peter Fankhauser,et al.  WordNet for Lexical Cohesion Analysis , 2004 .

[11]  Alexander Mehler,et al.  WikiDB: Building Interoperable Wiki-Based Knowledge Resources for Semantic Databases , 2008 .

[12]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Lou Burnard New tricks from an old dog: An overview of TEI P5 , 2006, Digital Historical Corpora.

[15]  Kalina Bontcheva,et al.  GATE: an Architecture for Development of Robust HLT applications , 2002, ACL.

[16]  Alexander Mehler,et al.  Social Semantics And Its Evaluation By Means of Closed Topic Models: An SVM-Classification Approach Using Semantic Feature Replacement By Topic Generalization , 2009 .

[17]  Alexandra Ernst,et al.  A Corpus Management System for Historical Semantics , 2007 .

[18]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[19]  Kalina Bontcheva,et al.  Evolving GATE to meet new challenges in language engineering , 2004, Natural Language Engineering.

[20]  Alexander Mehler,et al.  eHumanities Desktop - An Architecture for Flexible Annotation in Iconographic Research , 2010, WEBIST.