论文信息 - Proceedings of the 12th Edition of the Konvens Conference, Hildesheim, Germany, October 8-10, 2014

Proceedings of the 12th Edition of the Konvens Conference, Hildesheim, Germany, October 8-10, 2014

The dependency of word similarity in vector space models on the frequency of words has been noted in a few studies, but has received very little attention. We study the influence of word frequency in a set of 10 000 randomly selected word pairs for a number of different combinations of feature weighting schemes and similarity measures. We find that the similarity of word pairs for all methods, except for the one using singular value decomposition to reduce the dimensionality of the feature space, is determined to a large extent by the frequency of the words. In a binary classification task of pairs of synonyms and unrelated words we find that for all similarity measures the results can be improved when we correct for the frequency bias.

Josef Ruppenhofer | Josef Ruppenhofer

[1] Colin Bannard. A Measure of Syntactic Flexibility for Automatically Identifying Multiword Expressions in Corpora , 2007 .

[2] William J. Byrne,et al. A Generative Probabilistic OCR Model for NLP Applications , 2003, NAACL.

[3] Eva-Maria Jakobs,et al. A multi-level annotation model for fine-grained opinion detection in German blog comments , 2012, KONVENS.

[4] Stefan Evert,et al. Is Part-of-Speech Tagging a Solved Task? An Evaluation of POS Taggers for the German Web as Corpus , 2009 .

[5] Bernd Bohnet,et al. Top Accuracy and Fast Dependency Parsing is not a Contradiction , 2010, COLING.

[6] Philipp Koehn,et al. Results of the WMT15 Metrics Shared Task , 2015, WMT@EMNLP.

[7] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.

[8] Sabine Buchholz,et al. CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[9] Carl Pollard,et al. A Centering Approach to Pronouns , 1987, ACL.

[10] Jeff McAffer,et al. Eclipse Rich Client Platform , 2010 .

[11] Ulrich Heid,et al. Towards a Tool for Interactive Concept Building for Large Scale Analysis in the Humanities , 2013, LaTeCH@ACL.

[12] C. Kunze,et al. Integrating GermaNet into EuroWordNet, a multilingual lexical-semantic database , 1999 .

[13] Massimo Poesio,et al. Specifying the Parameters of Centering Theory: a Corpus-Based Evaluation using Text from Application-Oriented Domains , 2000, ACL.

[14] Hinrich Schütze,et al. Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[15] Helmut Feldweg,et al. GermaNet - a Lexical-Semantic Net for German , 1997 .

[16] William A. Gale,et al. Good-Turing Frequency Estimation Without Tears , 1995, J. Quant. Linguistics.

[17] Mari Ostendorf,et al. Modeling long distance dependence in language: topic mixtures versus dynamic cache models , 1996, IEEE Trans. Speech Audio Process..

[18] Agathe Gebert,et al. Open-Access-Kooperationen mit Verlagen: Zwischenbilanz eines Experiments im Bereich der Erziehungswissenschaft , 2010 .

[19] Karo Moilanen,et al. Sentiment Composition , 2007 .

[20] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[21] Sylviane Granger,et al. The Cambridge Handbook of Learner Corpus Research , 2015 .

[22] Miles Osborne,et al. Statistical Machine Translation , 2010, Encyclopedia of Machine Learning and Data Mining.

[23] Geoffrey Zweig,et al. A Challenge Set for Advancing Language Modeling , 2012, WLM@NAACL-HLT.

[24] Chin-Yew Lin,et al. ORANGE: a Method for Evaluating Automatic Evaluation Metrics for Machine Translation , 2004, COLING.

[25] Christian Chiarcos,et al. ANNIS: A Search Tool for Multi-Layer Annotated Corpora , 2009 .

[26] Ingo Schröder,et al. Natural language parsing with graded constraints , 2002 .

[27] Gil-Chang Kim,et al. Multiple sets of features for automatic genre classification of web documents , 2005, Inf. Process. Manag..

[28] Helmut Schmidt,et al. Probabilistic part-of-speech tagging using decision trees , 1994 .

[29] Mirella Lapata,et al. Automatic Evaluation of Text Coherence: Models and Representations , 2005, IJCAI.

[30] Joan Bresnan,et al. Syntax of the Comparative Clause Construction in English , 1973 .

[31] Gerhard Heyer,et al. SentiWS - A Publicly Available German-language Resource for Sentiment Analysis , 2010, LREC.

[32] Maya R. Gupta,et al. Part-of-speech histograms for genre classification of text , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33] Dan Klein,et al. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[34] Stephen Shaoyi Liao,et al. Mining comparative opinions from customer reviews for Competitive Intelligence , 2011, Decis. Support Syst..

[35] I. Dan Melamed. Automatic Construction of Clean Broad-Coverage Translation Lexicons , 1996, AMTA.

[36] William D. Lewis,et al. Intelligent Selection of Language Model Training Data , 2010, ACL.

[37] Heinz Schlaffer. Emil Staigers Grundbegriffe der Poetik , 2003, Monatshefte.

[38] Youssef Bassil,et al. OCR Context-Sensitive Error Correction Based on Google Web 1T 5-Gram Data Set , 2012, ArXiv.

[39] Mirella Lapata,et al. Machine Translation by Triangulation: Making Effective Use of Multi-Parallel Corpora , 2007, ACL.

[40] Malvina Nissim,et al. Learning to buy a Renault and talk to BMW: A supervised approach to conventional metonymy , 2005 .

[41] Jonathan G. Fiscus,et al. A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[42] Scott Weinstein,et al. Centering: A Framework for Modeling the Local Coherence of Discourse , 1995, CL.

[43] Ulrich Heid,et al. A Tool/Database Interface for Multi-Level Analyses , 2012, LREC.

[44] Anoop Sarkar,et al. Active Learning for the Identification of Nonliteral Language , 2007, Proceedings of the Workshop on Computational Approaches to Figurative Language - FigLanguages '07.

[45] Kenneth Heafield,et al. KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[46] David A. Ferrucci,et al. UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[47] Joseph H. Greenberg,et al. Some Universals of Grammar with Particular Reference to the Order of Meaningful Elements , 1990, On Language.

[48] Petr Pajas,et al. Recent Advances in a Feature-Rich Framework for Treebank Annotation , 2008, COLING.

[49] Naoaki Okazaki,et al. Simple and Efficient Algorithm for Approximate Dictionary Matching , 2010, COLING.

[50] Rebekah George Benjamin. Reconstructing Readability: Recent Developments and Recommendations in the Analysis of Text Difficulty , 2012 .

[51] Thorsten Brants,et al. TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[52] Bing Liu,et al. Identifying comparative sentences in text documents , 2006, SIGIR.

[53] Eric K. Ringger,et al. Improving optical character recognition through efficient multiple system alignment , 2009, JCDL '09.

[54] Afsaneh Fazly,et al. Automatically learning semantic knowledge about multiword predicates , 2007, Lang. Resour. Evaluation.

[55] Brian D. Davison,et al. Web page classification: Features and algorithms , 2009, CSUR.

[56] Caroline Sporleder,et al. Unsupervised Recognition of Literal and Non-Literal Use of Idiomatic Expressions , 2009, EACL.

[57] Mirella Lapata,et al. Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[58] Simon Clematide,et al. MLSA - A Multi-layered Reference Corpus for German Sentiment Analysis , 2012, LREC.

[59] Christian Biemann,et al. Corpus Portal for Search in Monolingual Corpora , 2006, LREC.

[60] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[61] Wolfgang Menzel,et al. Hybrid Parsing: Using Probabilistic Models as Predictors for a Symbolic Parser , 2006, ACL.

[62] Kalina Bontcheva,et al. Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data , 2013, RANLP.

[63] David L. Deephouse,et al. Media Reputation as a Strategic Resource: An Integration of Mass Communication and Resource-Based Theories , 1999 .

[64] Slav Petrov,et al. Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections , 2011, ACL.

[65] Suzanne Stevenson,et al. Classifying Particle Semantics in English Verb-Particle Constructions , 2006 .

[66] Christian Biemann,et al. Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems , 2006 .

[67] Olaf Köller,et al. Pisa 2012 : Fortschritte und Herausforderungen in Deutschland , 2013 .

[68] B. Iraide Ibarretxe-Antunano,et al. Polysemy and metaphor in perception verbs : a cross-linguistic study , 2000 .

[69] F. Daneš. Functional Sentence Perspective and the Organization of the Text , 1974 .

[70] Benno Stein,et al. Genre classification of Web pages user study and feasibility analysis , 2004 .

[71] Eneko Agirre,et al. Word Sense Disambiguation: Algorithms and Applications , 2007 .

[72] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.

[73] Janyce Wiebe,et al. Subjectivity Word Sense Disambiguation , 2009, EMNLP.

[74] Bonnie Jean Dorr,et al. Inducing a semantic frame lexicon from WordNet data , 2004 .

[75] Peter Schirmbacher,et al. LAUDATIO-Repository: Accessing a heterogeneous field of linguistic corpora with the help of an open access repository , 2014, DH.

[76] Jacob Eisenstein,et al. Hierarchical Text Segmentation from Multi-Scale Lexical Cohesion , 2009, NAACL.

[77] Andrew Hardie,et al. CQPweb — combining power, flexibility and usability in a corpus analysis tool , 2012 .

[78] Ani Nenkova,et al. Creating Local Coherence: An Empirical Assessment , 2010, NAACL.

[79] Walter Daelemans,et al. Improving Accuracy in word class tagging through the Combination of Machine Learning Systems , 2001, CL.

[80] Marti A. Hearst. Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[81] Evelina Andersson,et al. Cross-Framework Evaluation for Statistical Parsing , 2012, EACL.

[82] H. Balk,et al. IMPACT: Improving Access to Text , 2008 .

[83] Mari Ostendorf,et al. Reading Level Assessment Using Support Vector Machines and Statistical Language Models , 2005, ACL.

[84] Timothy Baldwin,et al. SemEval-2010 Task 5 : Automatic Keyphrase Extraction from Scientific Articles , 2010, *SEMEVAL.

[85] Mitsuru Ishizuka,et al. Semantically distinct verb classes involved in sentiment analysis , 2009, IADIS AC.

[86] Jonas Kuhn,et al. Making Ellipses Explicit in Dependency Conversion for a German Treebank , 2012, LREC.

[87] Sabine Schulte im Walde,et al. German Perception Verbs: Automatic Classification of Prototypical and Multiple Non-literal Meanings , 2014, KONVENS.

[88] F ChenStanley,et al. An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[89] William R. Hersh,et al. A Survey of Current Work in Biomedical Text Mining , 2005 .

[90] Kilian A. Foth,et al. Modeling dependency grammar with restricted constraints , 2000 .

[91] Antonio Toral,et al. DELiC4MT: A Tool for Diagnostic MT Evaluation over User-defined Linguistic Phenomena , 2012, Prague Bull. Math. Linguistics.

[92] Anoop Sarkar,et al. A Clustering Approach for Nearly Unsupervised Recognition of Nonliteral Language , 2006, EACL.

[93] Evelina Andersson,et al. Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation , 2011, EMNLP.

[94] Rudolf Mathar,et al. Part-Of-Speech Tagging for Social Media Texts , 2013, GSCL.

[95] Stephen V. Rice,et al. Measuring the accuracy of page-reading systems , 1996 .

[96] Rudolf Mathar,et al. Efficient Training Data Enrichment and Unknown Token Handling for POS Tagging of Nonstandardized Texts , 2014, KONVENS.

[97] Tomasz Parkoła,et al. Report on the comparison of Tesseract and ABBYY FineReader OCR engines , 2012 .

[98] Ari Rappoport,et al. Fully Unsupervised Discovery of Concept-Specific Relationships by Web Mining , 2007, ACL.

[99] Walt Detmar Meurers,et al. Readability Classification for German using Lexical, Syntactic, and Morphological Features , 2012, COLING.

[100] Wolfgang Menzel,et al. A broad-coverage parser for German based on defeasible constraints , 2008 .

[101] Patrick McCrae,et al. Integrating Cross-Modal Context for PP Attachment Disambiguation , 2007, Third International Conference on Natural Computation (ICNC 2007).

[102] Qun Liu,et al. Bilingually-Constrained (Monolingual) Shift-Reduce Parsing , 2009, EMNLP.

[103] Miles Osborne,et al. Smoothed Bloom Filter Language Models: Tera-Scale LMs on the Cheap , 2007, EMNLP.

[104] Martin Reynaert. Corpus-Induced Corpus Clean-up , 2006, LREC.

[105] Helmut Schmid,et al. Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[106] Joakim Nivre,et al. MaltOptimizer: An Optimization Tool for MaltParser , 2012, EACL.

[107] Laurent Romary,et al. Pepper: Handling a multiverse of formats , 2011 .

[108] M. Schiessl,et al. Mobile Usability , 2002 .

[109] Dan Klein,et al. Learning Accurate, Compact, and Interpretable Tree Annotation , 2006, ACL.

[110] Michael Strube,et al. Combining Collocations, Lexical and Encyclopedic Knowledge for Metonymy Resolution , 2009, EMNLP.

[111] H. B. Mann,et al. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[112] James R. Curran,et al. Automatically Detecting and Attributing Indirect Quotations , 2013, EMNLP.

[113] Nora Hollenstein,et al. The Detection and Analysis of Bi-polar Phrases and Polarity Conflicts , 2014, NLPCS 2014.

[114] Jérôme Euzenat,et al. Ontology Matching: State of the Art and Future Challenges , 2013, IEEE Transactions on Knowledge and Data Engineering.

[115] J. Ross Quinlan,et al. C4.5: Programs for Machine Learning , 1992 .

[116] Wolfgang Lezius,et al. TIGER: Linguistic Interpretation of a German Corpus , 2004 .

[117] Thomas C. Schmidt. Transcribing and annotating spoken language with EXMARaLDA , 2004 .

[118] Holger Schwenk,et al. Continuous space language models , 2007, Comput. Speech Lang..

[119] Isa Maks,et al. A lexicon model for deep sentiment analysis and opinion mining applications , 2012, Decis. Support Syst..

[120] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[121] Jörg Rech,et al. Multilingual extraction and mapping of dictionary entry names in business schema integration , 2010, iiWAS.

[122] Nicholas Asher,et al. How do Negation and Modality Impact on Opinions? , 2012, ExProM@ACL.

[123] Alon Lavie,et al. Choosing the Right Evaluation for Machine Translation: an Examination of Annotator and Automatic Metric Performance on Human Judgment Tasks , 2010, AMTA.

[124] Yangyang Shi,et al. Adaptive Language Modeling with a Set of Domain Dependent Models , 2012, TSD.

[125] Alan Agresti,et al. Categorical Data Analysis , 2003 .

[126] Julianne Nyhan. Visual Interface Design for Digital Cultural Heritage: A Guide to Rich Prospect Browsing , 2014, J. Documentation.

[127] Jimmy J. Lin,et al. Smoothing techniques for adaptive online language models: topic tracking in tweet streams , 2011, KDD.

[128] Brendan T. O'Connor,et al. Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments , 2010, ACL.

[129] Ted Sanders,et al. How grammatical and discourse factors may predict the forward prominence of referents: two corpus studies , 2009 .

[130] Alexander Mehler,et al. Computational Linguistics for Mere Mortals - Powerful but Easy-to-use Linguistic Processing for Scientists in the Humanities , 2010, LREC.

[131] Richard Johansson,et al. Extended Constituent-to-Dependency Conversion for English , 2007, NODALIDA.

[132] Oren Etzioni,et al. Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[133] Rodger Kibble. A Reformulation of Rule 2 of Centering Theory , 2001, Computational Linguistics.

[134] Xiang Tong,et al. A Statistical Approach to Automatic OCR Error Correction in Context , 1996, VLC@COLING.

[135] Yoshua Bengio,et al. Inference for the Generalization Error , 1999, Machine Learning.

[136] Thomas Ertl,et al. VarifocalReader — In-Depth Visual Analysis of Large Text Documents , 2014, IEEE Transactions on Visualization and Computer Graphics.

[137] Walt Detmar Meurers,et al. Exploring Measures of "Readability" for Spoken Language: Analyzing linguistic features of subtitles to identify age-specific TV programs , 2014, PITR@EACL.

[138] Kilian A. Foth,et al. Writing Weighted Constraints for Large Dependency Grammars , 2004 .

[139] Adam Kilgarriff,et al. The Sketch Engine , 2004 .

[140] Micha Elsner,et al. Extending the Entity Grid with Entity-Specific Features , 2011, ACL.

[141] Iryna Gurevych,et al. WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations , 2013, ACL.

[142] William H. DuBay. The Principles of Readability. , 2004 .

[143] Bing Liu,et al. Mining Opinions in Comparative Sentences , 2008, COLING.

[144] Rico Sennrich,et al. Reducing OCR errors by combining two OCR systems , 2010 .

[145] Michael Cysouw,et al. Parallel texts: using translational equivalents in linguistic typology , 2007 .

[146] Richard Power,et al. An integrated framework for text planning and pronominalisation , 2000, INLG.

[147] Nora Hollenstein,et al. Inducing Domain-specific Noun Polarity Guided by Domain-independent Polarity Preferences of Adjectives , 2014, WASSA@ACL.

[148] Michael Strube,et al. Global Inference for Bridging Anaphora Resolution , 2013, NAACL.

[149] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[150] Janyce Wiebe,et al. Recognizing subjectivity: a case study in manual tagging , 1999, Natural Language Engineering.

[151] Rose Holley,et al. How Good Can It Get? Analysing and Improving OCR Accuracy in Large Scale Historic Newspaper Digitisation Programs , 2009, D Lib Mag..

[152] Patrick Paroubek,et al. PASSAGE: from French Parser Evaluation to Large Sized Treebank , 2008, LREC.

[153] Richard M. Schwartz,et al. A hidden Markov model information retrieval system , 1999, SIGIR '99.

[154] Bing Liu,et al. Mining Comparative Sentences and Relations , 2006, AAAI.

[155] Alon Lavie,et al. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[156] Marina Santini. A Shallow Approach To Syntactic Feature Extraction For Genre Classification , 2003 .

[157] Ines Rehbein. Fine-Grained POS Tagging of German Tweets , 2013, GSCL.

[158] Shiri Dori-Hacohen,et al. Detecting controversy on the web , 2013, CIKM.

[159] George A. Miller,et al. Introduction to WordNet: An On-line Lexical Database , 1990 .

[160] James R. Glass,et al. Style & Topic Language Model Adaptation Using HMM-LDA , 2006, EMNLP.

[161] Afsaneh Fazly,et al. Unsupervised Type and Token Identification of Idiomatic Expressions , 2009, CL.

[162] Kai Eckert,et al. Identifying References to Datasets in Publications , 2012, TPDL.

[163] Thomas Eckart,et al. Building Large Monolingual Dictionaries at the Leipzig Corpora Collection: From 100 to 200 Languages , 2012, LREC.

[164] M. Kenward,et al. An Introduction to the Bootstrap , 2007 .

[165] Felix Golcher,et al. Statistical Text Segmentation with Partial Structure Analysis , 2006 .

[166] John Unsworth. Computational Work with Very Large Text Collections. Interoperability, Sustainability, and the TEI , 2011 .

[167] Caroline Sporleder,et al. Classifier Combination for Contextual Idiom Detection Without Labelled Data , 2009, EMNLP.

[168] Martin Reynaert,et al. Text Induced Spelling Correction , 2004, COLING.

[169] Megumi Kameyama,et al. A Property-Sharing Constraint in Centering , 1986, ACL.

[170] R. A. Schulz,et al. Lesen-Verstehen-Lernen-Schreiben: Die Schwierigkeitsstufen von Texten in deutscher Sprache , 1985 .

[171] Marc Rittberger,et al. Information search behaviour in the German Education Index , 2011, World Digit. Libr..

[172] Manfred Stede,et al. Identifying Formal and Functional Zones in Film Reviews , 2007, SIGdial.

[173] Erhard W. Hinrichs,et al. WebLicht: Web-based LRT Services in a Distributed eScience Infrastructure , 2010, LREC.

[174] Sung-Hyon Myaeng,et al. Identifying Controversial Issues and Their Sub-topics in News Articles , 2010, PAISI.

[175] F. Wilcoxon. Individual Comparisons by Ranking Methods , 1945 .

[176] Laurent Romary,et al. A model oriented approach to the mapping of annotation formats using standards , 2010 .

[177] B. V. Sukhotin. Optimization algorithms of deciphering as the elements of a linguistic theory , 1988, COLING.

[178] James R. Lewis. The Voice in the Machine: Building Computers That Understand Speech , 2012, Int. J. Hum. Comput. Interact..

[179] Ding Liu,et al. Syntactic Features for Evaluation of Machine Translation , 2005, IEEvaluation@ACL.

[180] S. Dick,et al. Applying Novel Resampling Strategies To Software Defect Prediction , 2007, NAFIPS 2007 - 2007 Annual Meeting of the North American Fuzzy Information Processing Society.

[181] Adam Kilgarriff,et al. Introduction to the Special Issue on the Web as Corpus , 2003, CL.

[182] Z. Harris. From Phoneme to Morpheme , 1955 .

[183] Maja Popovic. Hjerson: An Open Source Tool for Automatic Error Classification of Machine Translation Output , 2011, Prague Bull. Math. Linguistics.

[184] Ondrej Bojar,et al. Addicter: What Is Wrong with My Translations? , 2011, Prague Bull. Math. Linguistics.

[185] Erhard W. Hinrichs,et al. A Corpus Representation Format for Linguistic Web Services: The D-SPIN Text Corpus Format and its Relationship with ISO Standards , 2010, LREC.

[186] Jonas Kuhn,et al. PARALLEL LFG GRAMMARS ON PARALLEL CORPORA : A BASE FOR PRACTICAL TRIANGULATION , 2008 .

[187] Katrin Hein,et al. A Database of Freely Written Texts of German School Students for the Purpose of Automatic Spelling Error Classification , 2014, LREC.

[188] Martin Reynaert,et al. Non-interactive OCR Post-correction for Giga-Scale Digitization Projects , 2008, CICLing.

[189] Wolfgang Menzel,et al. Co-Parsing with Competitive Models , 2009, RANLP.

[190] Keith Trnka. Adaptive Language Modeling for Word Prediction , 2008, ACL.

[191] R. Kibble. Cb or not Cb? Centering theory applied to NLG , 1999 .

[192] Nancy Ide,et al. Bridging the Gaps: Interoperability for GrAF, GATE, and UIMA , 2009, Linguistic Annotation Workshop.