Proceedings of the 12th Edition of the Konvens Conference, Hildesheim, Germany, October 8-10, 2014

The dependency of word similarity in vector space models on the frequency of words has been noted in a few studies, but has received very little attention. We study the influence of word frequency in a set of 10 000 randomly selected word pairs for a number of different combinations of feature weighting schemes and similarity measures. We find that the similarity of word pairs for all methods, except for the one using singular value decomposition to reduce the dimensionality of the feature space, is determined to a large extent by the frequency of the words. In a binary classification task of pairs of synonyms and unrelated words we find that for all similarity measures the results can be improved when we correct for the frequency bias.

[1]  Colin Bannard A Measure of Syntactic Flexibility for Automatically Identifying Multiword Expressions in Corpora , 2007 .

[2]  William J. Byrne,et al.  A Generative Probabilistic OCR Model for NLP Applications , 2003, NAACL.

[3]  Eva-Maria Jakobs,et al.  A multi-level annotation model for fine-grained opinion detection in German blog comments , 2012, KONVENS.

[4]  Stefan Evert,et al.  Is Part-of-Speech Tagging a Solved Task? An Evaluation of POS Taggers for the German Web as Corpus , 2009 .

[5]  Bernd Bohnet,et al.  Top Accuracy and Fast Dependency Parsing is not a Contradiction , 2010, COLING.

[6]  Philipp Koehn,et al.  Results of the WMT15 Metrics Shared Task , 2015, WMT@EMNLP.

[7]  Alon Lavie,et al.  Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[8]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[9]  Carl Pollard,et al.  A Centering Approach to Pronouns , 1987, ACL.

[10]  Jeff McAffer,et al.  Eclipse Rich Client Platform , 2010 .

[11]  Ulrich Heid,et al.  Towards a Tool for Interactive Concept Building for Large Scale Analysis in the Humanities , 2013, LaTeCH@ACL.

[12]  C. Kunze,et al.  Integrating GermaNet into EuroWordNet, a multilingual lexical-semantic database , 1999 .

[13]  Massimo Poesio,et al.  Specifying the Parameters of Centering Theory: a Corpus-Based Evaluation using Text from Application-Oriented Domains , 2000, ACL.

[14]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[15]  Helmut Feldweg,et al.  GermaNet - a Lexical-Semantic Net for German , 1997 .

[16]  William A. Gale,et al.  Good-Turing Frequency Estimation Without Tears , 1995, J. Quant. Linguistics.

[17]  Mari Ostendorf,et al.  Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[18]  Agathe Gebert,et al.  Open-Access-Kooperationen mit Verlagen: Zwischenbilanz eines Experiments im Bereich der Erziehungswissenschaft , 2010 .

[19]  Karo Moilanen,et al.  Sentiment Composition , 2007 .

[20]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[21]  Sylviane Granger,et al.  The Cambridge Handbook of Learner Corpus Research , 2015 .

[22]  Miles Osborne,et al.  Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[23]  Geoffrey Zweig,et al.  A Challenge Set for Advancing Language Modeling , 2012, WLM@NAACL-HLT.

[24]  Chin-Yew Lin,et al.  ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[25]  Christian Chiarcos,et al.  ANNIS: A Search Tool for Multi-Layer Annotated Corpora , 2009 .

[26]  Ingo Schröder,et al.  Natural language parsing with graded constraints , 2002 .

[27]  Gil-Chang Kim,et al.  Multiple sets of features for automatic genre classification of web documents , 2005, Inf. Process. Manag..

[28]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[29]  Mirella Lapata,et al.  Automatic Evaluation of Text Coherence: Models and Representations , 2005, IJCAI.

[30]  Joan Bresnan,et al.  Syntax of the Comparative Clause Construction in English , 1973 .

[31]  Gerhard Heyer,et al.  SentiWS - A Publicly Available German-language Resource for Sentiment Analysis , 2010, LREC.

[32]  Maya R. Gupta,et al.  Part-of-speech histograms for genre classification of text , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[34]  Stephen Shaoyi Liao,et al.  Mining comparative opinions from customer reviews for Competitive Intelligence , 2011, Decis. Support Syst..

[35]  I. Dan Melamed Automatic Construction of Clean Broad-Coverage Translation Lexicons , 1996, AMTA.

[36]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[37]  Heinz Schlaffer Emil Staigers Grundbegriffe der Poetik , 2003, Monatshefte.

[38]  Youssef Bassil,et al.  OCR Context-Sensitive Error Correction Based on Google Web 1T 5-Gram Data Set , 2012, ArXiv.

[39]  Mirella Lapata,et al.  Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora , 2007, ACL.

[40]  Malvina Nissim,et al.  Learning to buy a Renault and talk to BMW: A supervised approach to conventional metonymy , 2005 .

[41]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[42]  Scott Weinstein,et al.  Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[43]  Ulrich Heid,et al.  A Tool/Database Interface for Multi-Level Analyses , 2012, LREC.

[44]  Anoop Sarkar,et al.  Active Learning for the Identification of Nonliteral Language , 2007, Proceedings of the Workshop on Computational Approaches to Figurative Language - FigLanguages '07.

[45]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[46]  David A. Ferrucci,et al.  UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[47]  Joseph H. Greenberg,et al.  Some Universals of Grammar with Particular Reference to the Order of Meaningful Elements , 1990, On Language.

[48]  Petr Pajas,et al.  Recent Advances in a Feature-Rich Framework for Treebank Annotation , 2008, COLING.

[49]  Naoaki Okazaki,et al.  Simple and Efficient Algorithm for Approximate Dictionary Matching , 2010, COLING.

[50]  Rebekah George Benjamin Reconstructing Readability: Recent Developments and Recommendations in the Analysis of Text Difficulty , 2012 .

[51]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[52]  Bing Liu,et al.  Identifying comparative sentences in text documents , 2006, SIGIR.

[53]  Eric K. Ringger,et al.  Improving optical character recognition through efficient multiple system alignment , 2009, JCDL '09.

[54]  Afsaneh Fazly,et al.  Automatically learning semantic knowledge about multiword predicates , 2007, Lang. Resour. Evaluation.

[55]  Brian D. Davison,et al.  Web page classification: Features and algorithms , 2009, CSUR.

[56]  Caroline Sporleder,et al.  Unsupervised Recognition of Literal and Non-Literal Use of Idiomatic Expressions , 2009, EACL.

[57]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[58]  Simon Clematide,et al.  MLSA - A Multi-layered Reference Corpus for German Sentiment Analysis , 2012, LREC.

[59]  Christian Biemann,et al.  Corpus Portal for Search in Monolingual Corpora , 2006, LREC.

[60]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[61]  Wolfgang Menzel,et al.  Hybrid Parsing: Using Probabilistic Models as Predictors for a Symbolic Parser , 2006, ACL.

[62]  Kalina Bontcheva,et al.  Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data , 2013, RANLP.

[63]  David L. Deephouse,et al.  Media Reputation as a Strategic Resource: An Integration of Mass Communication and Resource-Based Theories , 1999 .

[64]  Slav Petrov,et al.  Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[65]  Suzanne Stevenson,et al.  Classifying Particle Semantics in English Verb-Particle Constructions , 2006 .

[66]  Christian Biemann,et al.  Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems , 2006 .

[67]  Olaf Köller,et al.  Pisa 2012 : Fortschritte und Herausforderungen in Deutschland , 2013 .

[68]  B. Iraide Ibarretxe-Antunano,et al.  Polysemy and metaphor in perception verbs : a cross-linguistic study , 2000 .

[69]  F. Daneš Functional Sentence Perspective and the Organization of the Text , 1974 .

[70]  Benno Stein,et al.  Genre classification of Web pages user study and feasibility analysis , 2004 .

[71]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .

[72]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[73]  Janyce Wiebe,et al.  Subjectivity Word Sense Disambiguation , 2009, EMNLP.

[74]  Bonnie Jean Dorr,et al.  Inducing a semantic frame lexicon from WordNet data , 2004 .

[75]  Peter Schirmbacher,et al.  LAUDATIO-Repository: Accessing a heterogeneous field of linguistic corpora with the help of an open access repository , 2014, DH.

[76]  Jacob Eisenstein,et al.  Hierarchical Text Segmentation from Multi-Scale Lexical Cohesion , 2009, NAACL.

[77]  Andrew Hardie,et al.  CQPweb — combining power, flexibility and usability in a corpus analysis tool , 2012 .

[78]  Ani Nenkova,et al.  Creating Local Coherence: An Empirical Assessment , 2010, NAACL.

[79]  Walter Daelemans,et al.  Improving Accuracy in word class tagging through the Combination of Machine Learning Systems , 2001, CL.

[80]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[81]  Evelina Andersson,et al.  Cross-Framework Evaluation for Statistical Parsing , 2012, EACL.

[82]  H. Balk,et al.  IMPACT: Improving Access to Text , 2008 .

[83]  Mari Ostendorf,et al.  Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[84]  Timothy Baldwin,et al.  SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles , 2010, *SEMEVAL.

[85]  Mitsuru Ishizuka,et al.  Semantically distinct verb classes involved in sentiment analysis , 2009, IADIS AC.

[86]  Jonas Kuhn,et al.  Making Ellipses Explicit in Dependency Conversion for a German Treebank , 2012, LREC.

[87]  Sabine Schulte im Walde,et al.  German Perception Verbs: Automatic Classification of Prototypical and Multiple Non-literal Meanings , 2014, KONVENS.

[88]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[89]  William R. Hersh,et al.  A Survey of Current Work in Biomedical Text Mining , 2005 .

[90]  Kilian A. Foth,et al.  Modeling dependency grammar with restricted constraints , 2000 .

[91]  Antonio Toral,et al.  DELiC4MT: A Tool for Diagnostic MT Evaluation over User-defined Linguistic Phenomena , 2012, Prague Bull. Math. Linguistics.

[92]  Anoop Sarkar,et al.  A Clustering Approach for Nearly Unsupervised Recognition of Nonliteral Language , 2006, EACL.

[93]  Evelina Andersson,et al.  Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation , 2011, EMNLP.

[94]  Rudolf Mathar,et al.  Part-Of-Speech Tagging for Social Media Texts , 2013, GSCL.

[95]  Stephen V. Rice,et al.  Measuring the accuracy of page-reading systems , 1996 .

[96]  Rudolf Mathar,et al.  Efficient Training Data Enrichment and Unknown Token Handling for POS Tagging of Nonstandardized Texts , 2014, KONVENS.

[97]  Tomasz Parkoła,et al.  Report on the comparison of Tesseract and ABBYY FineReader OCR engines , 2012 .

[98]  Ari Rappoport,et al.  Fully Unsupervised Discovery of Concept-Specific Relationships by Web Mining , 2007, ACL.

[99]  Walt Detmar Meurers,et al.  Readability Classification for German using Lexical, Syntactic, and Morphological Features , 2012, COLING.

[100]  Wolfgang Menzel,et al.  A broad-coverage parser for German based on defeasible constraints , 2008 .

[101]  Patrick McCrae,et al.  Integrating Cross-Modal Context for PP Attachment Disambiguation , 2007, Third International Conference on Natural Computation (ICNC 2007).

[102]  Qun Liu,et al.  Bilingually-Constrained (Monolingual) Shift-Reduce Parsing , 2009, EMNLP.

[103]  Miles Osborne,et al.  Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap , 2007, EMNLP.

[104]  Martin Reynaert Corpus-Induced Corpus Clean-up , 2006, LREC.

[105]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[106]  Joakim Nivre,et al.  MaltOptimizer: An Optimization Tool for MaltParser , 2012, EACL.

[107]  Laurent Romary,et al.  Pepper: Handling a multiverse of formats , 2011 .

[108]  M. Schiessl,et al.  Mobile Usability , 2002 .

[109]  Dan Klein,et al.  Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[110]  Michael Strube,et al.  Combining Collocations, Lexical and Encyclopedic Knowledge for Metonymy Resolution , 2009, EMNLP.

[111]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[112]  James R. Curran,et al.  Automatically Detecting and Attributing Indirect Quotations , 2013, EMNLP.

[113]  Nora Hollenstein,et al.  The Detection and Analysis of Bi-polar Phrases and Polarity Conflicts , 2014, NLPCS 2014.

[114]  Jérôme Euzenat,et al.  Ontology Matching: State of the Art and Future Challenges , 2013, IEEE Transactions on Knowledge and Data Engineering.

[115]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[116]  Wolfgang Lezius,et al.  TIGER: Linguistic Interpretation of a German Corpus , 2004 .

[117]  Thomas C. Schmidt Transcribing and annotating spoken language with EXMARaLDA , 2004 .

[118]  Holger Schwenk,et al.  Continuous space language models , 2007, Comput. Speech Lang..

[119]  Isa Maks,et al.  A lexicon model for deep sentiment analysis and opinion mining applications , 2012, Decis. Support Syst..

[120]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[121]  Jörg Rech,et al.  Multilingual extraction and mapping of dictionary entry names in business schema integration , 2010, iiWAS.

[122]  Nicholas Asher,et al.  How do Negation and Modality Impact on Opinions? , 2012, ExProM@ACL.

[123]  Alon Lavie,et al.  Choosing the Right Evaluation for Machine Translation: an Examination of Annotator and Automatic Metric Performance on Human Judgment Tasks , 2010, AMTA.

[124]  Yangyang Shi,et al.  Adaptive Language Modeling with a Set of Domain Dependent Models , 2012, TSD.

[125]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[126]  Julianne Nyhan Visual Interface Design for Digital Cultural Heritage: A Guide to Rich Prospect Browsing , 2014, J. Documentation.

[127]  Jimmy J. Lin,et al.  Smoothing techniques for adaptive online language models: topic tracking in tweet streams , 2011, KDD.

[128]  Brendan T. O'Connor,et al.  Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[129]  Ted Sanders,et al.  How grammatical and discourse factors may predict the forward prominence of referents: two corpus studies , 2009 .

[130]  Alexander Mehler,et al.  Computational Linguistics for Mere Mortals - Powerful but Easy-to-use Linguistic Processing for Scientists in the Humanities , 2010, LREC.

[131]  Richard Johansson,et al.  Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[132]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[133]  Rodger Kibble A Reformulation of Rule 2 of Centering Theory , 2001, Computational Linguistics.

[134]  Xiang Tong,et al.  A Statistical Approach to Automatic OCR Error Correction in Context , 1996, VLC@COLING.

[135]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[136]  Thomas Ertl,et al.  VarifocalReader — In-Depth Visual Analysis of Large Text Documents , 2014, IEEE Transactions on Visualization and Computer Graphics.

[137]  Walt Detmar Meurers,et al.  Exploring Measures of "Readability" for Spoken Language: Analyzing linguistic features of subtitles to identify age-specific TV programs , 2014, PITR@EACL.

[138]  Kilian A. Foth,et al.  Writing Weighted Constraints for Large Dependency Grammars , 2004 .

[139]  Adam Kilgarriff,et al.  The Sketch Engine , 2004 .

[140]  Micha Elsner,et al.  Extending the Entity Grid with Entity-Specific Features , 2011, ACL.

[141]  Iryna Gurevych,et al.  WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations , 2013, ACL.

[142]  William H. DuBay The Principles of Readability. , 2004 .

[143]  Bing Liu,et al.  Mining Opinions in Comparative Sentences , 2008, COLING.

[144]  Rico Sennrich,et al.  Reducing OCR errors by combining two OCR systems , 2010 .

[145]  Michael Cysouw,et al.  Parallel texts: using translational equivalents in linguistic typology , 2007 .

[146]  Richard Power,et al.  An integrated framework for text planning and pronominalisation , 2000, INLG.

[147]  Nora Hollenstein,et al.  Inducing Domain-specific Noun Polarity Guided by Domain-independent Polarity Preferences of Adjectives , 2014, WASSA@ACL.

[148]  Michael Strube,et al.  Global Inference for Bridging Anaphora Resolution , 2013, NAACL.

[149]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[150]  Janyce Wiebe,et al.  Recognizing subjectivity: a case study in manual tagging , 1999, Natural Language Engineering.

[151]  Rose Holley,et al.  How Good Can It Get? Analysing and Improving OCR Accuracy in Large Scale Historic Newspaper Digitisation Programs , 2009, D Lib Mag..

[152]  Patrick Paroubek,et al.  PASSAGE: from French Parser Evaluation to Large Sized Treebank , 2008, LREC.

[153]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[154]  Bing Liu,et al.  Mining Comparative Sentences and Relations , 2006, AAAI.

[155]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[156]  Marina Santini A Shallow Approach To Syntactic Feature Extraction For Genre Classification , 2003 .

[157]  Ines Rehbein Fine-Grained POS Tagging of German Tweets , 2013, GSCL.

[158]  Shiri Dori-Hacohen,et al.  Detecting controversy on the web , 2013, CIKM.

[159]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[160]  James R. Glass,et al.  Style & Topic Language Model Adaptation Using HMM-LDA , 2006, EMNLP.

[161]  Afsaneh Fazly,et al.  Unsupervised Type and Token Identification of Idiomatic Expressions , 2009, CL.

[162]  Kai Eckert,et al.  Identifying References to Datasets in Publications , 2012, TPDL.

[163]  Thomas Eckart,et al.  Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages , 2012, LREC.

[164]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[165]  Felix Golcher,et al.  Statistical Text Segmentation with Partial Structure Analysis , 2006 .

[166]  John Unsworth Computational Work with Very Large Text Collections. Interoperability, Sustainability, and the TEI , 2011 .

[167]  Caroline Sporleder,et al.  Classifier Combination for Contextual Idiom Detection Without Labelled Data , 2009, EMNLP.

[168]  Martin Reynaert,et al.  Text Induced Spelling Correction , 2004, COLING.

[169]  Megumi Kameyama,et al.  A Property-Sharing Constraint in Centering , 1986, ACL.

[170]  R. A. Schulz,et al.  Lesen-Verstehen-Lernen-Schreiben: Die Schwierigkeitsstufen von Texten in deutscher Sprache , 1985 .

[171]  Marc Rittberger,et al.  Information search behaviour in the German Education Index , 2011, World Digit. Libr..

[172]  Manfred Stede,et al.  Identifying Formal and Functional Zones in Film Reviews , 2007, SIGdial.

[173]  Erhard W. Hinrichs,et al.  WebLicht: Web-based LRT Services in a Distributed eScience Infrastructure , 2010, LREC.

[174]  Sung-Hyon Myaeng,et al.  Identifying Controversial Issues and Their Sub-topics in News Articles , 2010, PAISI.

[175]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[176]  Laurent Romary,et al.  A model oriented approach to the mapping of annotation formats using standards , 2010 .

[177]  B. V. Sukhotin Optimization algorithms of deciphering as the elements of a linguistic theory , 1988, COLING.

[178]  James R. Lewis The Voice in the Machine: Building Computers That Understand Speech , 2012, Int. J. Hum. Comput. Interact..

[179]  Ding Liu,et al.  Syntactic Features for Evaluation of Machine Translation , 2005, IEEvaluation@ACL.

[180]  S. Dick,et al.  Applying Novel Resampling Strategies To Software Defect Prediction , 2007, NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society.

[181]  Adam Kilgarriff,et al.  Introduction to the Special Issue on the Web as Corpus , 2003, CL.

[182]  Z. Harris From Phoneme to Morpheme , 1955 .

[183]  Maja Popovic Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output , 2011, Prague Bull. Math. Linguistics.

[184]  Ondrej Bojar,et al.  Addicter: What Is Wrong with My Translations? , 2011, Prague Bull. Math. Linguistics.

[185]  Erhard W. Hinrichs,et al.  A Corpus Representation Format for Linguistic Web Services: The D-SPIN Text Corpus Format and its Relationship with ISO Standards , 2010, LREC.

[186]  Jonas Kuhn,et al.  PARALLEL LFG GRAMMARS ON PARALLEL CORPORA : A BASE FOR PRACTICAL TRIANGULATION , 2008 .

[187]  Katrin Hein,et al.  A Database of Freely Written Texts of German School Students for the Purpose of Automatic Spelling Error Classification , 2014, LREC.

[188]  Martin Reynaert,et al.  Non-interactive OCR Post-correction for Giga-Scale Digitization Projects , 2008, CICLing.

[189]  Wolfgang Menzel,et al.  Co-Parsing with Competitive Models , 2009, RANLP.

[190]  Keith Trnka Adaptive Language Modeling for Word Prediction , 2008, ACL.

[191]  R. Kibble Cb or not Cb? Centering theory applied to NLG , 1999 .

[192]  Nancy Ide,et al.  Bridging the Gaps: Interoperability for GrAF, GATE, and UIMA , 2009, Linguistic Annotation Workshop.

[193]  Chris Mellish,et al.  Beyond Elaboration: The Interaction of Relations and Focus in Coherent Text , 2000 .

[194]  Erhard W. Hinrichs,et al.  WebLicht: Web-Based LRT Services for German , 2010, ACL.

[195]  Christian Wolff,et al.  WebNLP - An Integrated Web-Interface for Python NLTK and Voyant , 2014, KONVENS.

[196]  Pierre Nugues,et al.  Multilingual Semantic Role Labeling , 2009, CoNLL Shared Task.

[197]  Jens H. Weber,et al.  Building a biomedical tokenizer using the token lattice design pattern and the adapted Viterbi algorithm , 2010, 2010 Ninth International Conference on Machine Learning and Applications.

[198]  Bernd Bohnet,et al.  Very high accuracy and fast dependency parsing is not a contradiction , 2010, COLING 2010.

[199]  Ronald Rosenfeld,et al.  A maximum entropy approach to adaptive statistical language modelling , 1996, Comput. Speech Lang..

[200]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[201]  Frank Budinsky,et al.  EMF: Eclipse Modeling Framework 2.0 , 2009 .

[202]  Wolfgang Menzel,et al.  Guiding a Constraint Dependency Parser with Supertags , 2006, ACL.

[203]  Tanveer A. Faruquie,et al.  Adapting a WSJ trained part-of-speech tagger to noisy text: preliminary results , 2011, MOCR_AND '11.

[204]  Sung-Hyon Myaeng,et al.  Text genre classification with genre-revealing and subject-revealing features , 2002, SIGIR '02.

[205]  Peter D. Stetson,et al.  An Unsupervised Machine Learning Approach to Segmentation of Clinician-Entered Free Text , 2007, AMIA.

[206]  Udo Hahn,et al.  Understanding metonymies in discourse , 2002, Artif. Intell..

[207]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[208]  Jonas Kuhn,et al.  Detection of Product Comparisons - How Far Does an Out-of-the-Box Semantic Role Labeling System Take You? , 2013, EMNLP.

[209]  Michael Strube,et al.  Local and Global Context for Supervised and Unsupervised Metonymy Resolution , 2012, EMNLP-CoNLL.

[210]  Björn-Olav Dozo,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010 .

[211]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[212]  Sandra Kübler,et al.  Fast Domain Adaptation for Part of Speech Tagging for Dialogues , 2011, RANLP.

[213]  Malvina Nissim,et al.  Metonymy Resolution as a Classification Task , 2002, EMNLP.

[214]  Chris Mellish,et al.  Evaluating Centering for Information Ordering Using Corpora , 2009, CL.

[215]  Wolfgang Seeker,et al.  (Re)ranking Meets Morphosyntax: State-of-the-art Results from the SPMRL 2013 Shared Task , 2013, SPMRL@EMNLP.

[216]  Horia F. Pop,et al.  Learning Grammar Weights Using Genetic Algorithms , 2001 .

[217]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[218]  Yair Neuman,et al.  Literal and Metaphorical Sense Identification through Concrete and Abstract Context , 2011, EMNLP.

[219]  Bruno Pouliquen,et al.  Sentiment Analysis in the News , 2010, LREC.

[220]  Gertrud Faaß,et al.  SdeWaC - A Corpus of Parsable Sentences from the Web , 2013, GSCL.

[221]  Richard Johansson,et al.  The CoNLL-2009 Shared Task: Syntactic and Semantic Dependencies in Multiple Languages , 2009, CoNLL Shared Task.

[222]  Wolfgang Menzel,et al.  Interactive grammar development with WCDG , 2004, ACL.

[223]  Niels Ole,et al.  The MATE Workbench , 2000 .

[224]  Tiejun Zhao,et al.  Diagnostic Evaluation of Machine Translation Systems Using Automatically Constructed Linguistic Check-Points , 2008, COLING.

[225]  Angelika Storrer,et al.  Corpora of computer-mediated communication , 2008 .

[226]  Douglas Biber,et al.  Dimensions of Register Variation: A Cross-Linguistic Comparison , 1995 .

[227]  Kalina Bontcheva,et al.  Getting More Out of Biomedical Documents with GATE's Full Lifecycle Open Source Text Analytics , 2013, PLoS Comput. Biol..

[228]  Roberto Navigli,et al.  Word sense disambiguation: A survey , 2009, CSUR.

[229]  Stephen F. Weiss,et al.  Word segmentation by letter successor varieties , 1974, Inf. Storage Retr..

[230]  Michael Schiehlen A Cascaded Finite-State Parser for German , 2003, EACL.

[231]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[232]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[233]  Barbara Di Eugenio,et al.  Centering in Italian , 1996, ArXiv.

[234]  Christopher D. Manning,et al.  Phrasal: A Toolkit for New Directions in Statistical Machine Translation , 2014, WMT@ACL.

[235]  Serena Villata,et al.  A Support Framework for Argumentative Discussions Management in the Web , 2013, ESWC.

[236]  Helmut Schmid Efficient Parsing of Highly Ambiguous Context-Free Grammars with Bit Vectors , 2004, COLING.

[237]  Chris Dyer,et al.  Part-of-Speech Tagging for Twitter : Word Clusters and Other Advances , 2012 .

[238]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[239]  Mark Eisenegger,et al.  The True, the Good and the Beautiful: Reputation Management in the Media Society , 2008 .

[240]  Daniel Gildea,et al.  Unsupervised Tokenization for Machine Translation , 2009, EMNLP.

[241]  Ben Shneiderman,et al.  Designing the User Interface: Strategies for Effective Human-Computer Interaction , 1998 .

[242]  Brendan T. O'Connor,et al.  Improved Part-of-Speech Tagging for Online Conversational Text with Word Clusters , 2013, NAACL.

[243]  Lars Borin,et al.  Unsupervised Learning of Morphology , 2011, CL.

[244]  Francis Jack Smith,et al.  A Dynamic Language Model Based on Individual Word Domains , 2000, COLING.

[245]  Ondrej Dusek,et al.  Using Parallel Features in Parsing of Machine-Translated Sentences for Correction of Grammatical Errors , 2012, SSST@ACL.

[246]  Xavier Lorca,et al.  Choco: an Open Source Java Constraint Programming Library , 2008 .

[247]  Jens Lemcke,et al.  Computing a Canonical Hierarchical Schema , 2012, I-ESA.

[248]  Wolfgang Menzel,et al.  Decision Procedures for Dependency Parsing Using Graded Constraints , 1998 .

[249]  Patrick Paroubek,et al.  Twitter as a Corpus for Sentiment Analysis and Opinion Mining , 2010, LREC.

[250]  Qun Liu,et al.  Dependency Parsing and Projection Based on Word-Pair Classification , 2010, ACL.

[251]  Uwe Quasthoff Projekt Der Deutsche Wortschatz , 1997, GLDV-Jahrestagung.

[252]  Nancy Ide,et al.  What Does Interoperability Mean , Anyway ? Toward an Operational Definition of Interoperability for Language Technology , 2010 .

[253]  Jonas Kuhn,et al.  A Corpus of Comparisons in Product Reviews , 2014, LREC.

[254]  Guo-Hui Li,et al.  Mining Chinese comparative sentences by semantic role labeling , 2008, 2008 International Conference on Machine Learning and Cybernetics.

[255]  Dan Roth,et al.  A Linear Programming Formulation for Global Inference in Natural Language Tasks , 2004, CoNLL.

[256]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[257]  Patrice Bellot,et al.  A readability measure for an information retrieval process adapted to dyslexics , 2008 .

[258]  A. Stechow COMPARING SEMANTIC THEORIES OF COMPARISON , 1984 .

[259]  Ben Taskar,et al.  Dependency Grammar Induction via Bitext Projection Constraints , 2009, ACL/IJCNLP.

[260]  Zellig S. Harris,et al.  Morpheme Boundaries within Words: Report on a Computer Test , 1970 .