Unsupervised Methods for Learning and Using Semantics of Natural Language

Teaching the computer to understand language is the major goal in the field of natural language processing. In this thesis we introduce computational methods that aim to extract language structure — e.g. grammar, semantics or syntax — from text, which provides the computer with information in order to understand language. During the last decades, scientific efforts and the increase of computational resources made it possible to come closer to the goal of understanding language. In order to extract language structure, many approaches train the computer on manually created resources. Most of these so-called supervised methods show high performance when applied to similar textual data. However, they perform inferior when operating on textual data, which are different to the one they are trained on. Whereas training the computer is essential to obtain reasonable structure from natural language, we want to avoid training the computer using manually created resources. In this thesis, we present so-called unsupervised methods, which are suited to learn patterns in order to extract structure from textual data directly. These patterns are learned with methods that extract the semantics (meanings) of words and phrases. In comparison to manually built knowledge bases, unsupervised methods are more flexible: they can extract structure from text of different languages or text domains (e.g. finance or medical texts), without requiring manually annotated structure. However, learning structure from text often faces sparsity issues. The reason for these phenomena is that in language many words occur only few times. If a word is seen only few times no precise information can be extracted from the text it occurs. Whereas sparsity issues cannot be solved completely, information about most words can be gained by using large amounts of data. In the first chapter, we briefly describe how computers can learn to understand language. Afterwards, we present the main contributions, list the publications this thesis is based on and give an overview of this thesis. Chapter 2 introduces the terminology used in this thesis and gives a background about natural language processing. Then, we characterize the linguistic theory on how humans understand language. Afterwards, we show how the underlying linguistic intuition can be operationalized for computers. Based on this operationalization, we introduce a formalism for representing words and their context. This formalism is used in the following chapters in order to compute similarities between words. In Chapter 3 we give a brief description of methods in the field of computational semantics, which are targeted to compute similarities between words. All these methods have in common that they extract a contextual representation for a word that is generated from text. Then, this representation is used to compute similarities between words. In addition, we also present examples of the word similarities that are computed with these methods. Segmenting text into its topically related units is intuitively performed by humans and helps to extract connections between words in text. We equip the computer with these abilities by introducing a text segmentation algorithm in Chapter 4. This algorithm is based on a statistical topic model, which learns to cluster words into topics solely on the basis of the text. Using the segmentation algorithm, we demonstrate the influence of the parameters provided by the topic model. In addition, our method yields state-of-the-art performances on two datasets. In order to represent the meaning of words, we use context information (e.g. neighboring words), which is utilized to compute similarities. Whereas we described methods for word similarity computations in Chapter 3, we introduce a generic symbolic framework in Chapter 5. As we follow a symbolic approach, we do not represent words using dense numeric vectors but we use symbols (e.g. neighboring words or syntactic dependency parses) directly. Such a representation is readable for humans and is preferred in sensitive applications like the medical domain, where the reason for decisions needs to be provided. This framework enables the processing of arbitrarily large data. Furthermore, it is able to compute the most similar words for all words within a text collection resulting in a distributional thesaurus. We show the influence of various parameters deployed in our framework and examine the impact of different corpora used for computing similarities. Performing computations based on various contextual representations, we obtain the best results when using syntactic dependencies between words within sentences. However, these syntactic dependencies are predicted using a supervised dependency parser, which is trained on language-dependent and human-annotated resources. To avoid such language-specific preprocessing for computing distributional thesauri, we investigate the replacement of language-dependent dependency parsers by language-independent unsupervised parsers in Chapter 6. Evaluating the syntactic dependencies from unsupervised and supervised parses against human-annotated resources reveals that the unsupervised methods are not capable to compete with the supervised ones. In this chapter we use the predicted structure of both types of parses as context representation in order to compute word similarities. Then, we evaluate the quality of the similarities, which provides an extrinsic evaluation setup for both unsupervised and supervised dependency parsers. In an evaluation on English text, similarities computed based on contextual representations generated with unsupervised parsers do not outperform the similarities computed with the context representation extracted from supervised parsers. However, we observe the best results when applying context retrieved by the unsupervised parser for computing distributional thesauri on German language. Furthermore, we demonstrate that our framework is capable to combine different context representations, as we obtain the best performance with a combination of both flavors of syntactic dependencies for both languages. Most languages are not composed of single-worded terms only, but also contain many multi-worded terms that form a unit, called multiword expressions. The identification of multiword expressions is particularly important for semantics, as e.g. the term New York has a different meaning than its single terms New or York. Whereas most research on semantics avoids handling these expressions, we target on the extraction of multiword expressions in Chapter 7. Most previously introduced methods rely on part-of-speech tags and apply a ranking function to rank term sequences according to their multiwordness. Here, we introduce a language-independent and knowledge-free ranking method that uses information from distributional thesauri. Performing evaluations on English and French textual data, our method achieves the best results in comparison to methods from the literature. In Chapter 8 we apply information from distributional thesauri as features for various applications. First, we introduce a general setting for tackling the out-of-vocabulary problem. This problem describes the inferior performance of supervised methods according to words that are not contained in the training data. We alleviate this issue by replacing these unseen words with the most similar ones that are known, extracted from a distributional thesaurus. Using a supervised part-of-speech tagging method, we show substantial improvements in the classification performance for out-of-vocabulary words based on German and English textual data. The second application introduces a system for replacing words within a sentence with a word of the same meaning. For this application, the information from a distributional thesaurus provides the highest-scoring features. In the last application, we introduce an algorithm that is capable to detect the different meanings of a word and groups them into coarse-grained categories, called supersenses. Generating features by means of supersenses and distributional thesauri yields an performance increase when plugged into a supervised system that recognized named entities (e.g. names, organizations or locations). Further directions for using distributional thesauri are presented in Chapter 9. First, we lay out a method, which is capable of incorporating background information (e.g. source of the text collection or sense information) into a distributional thesaurus. Furthermore, we describe an approach on building thesauri for different text domains (e.g. medical or finance domain) and how they can be combined to have a high coverage of domain-specific knowledge as well as a broad background for the open domain. In the last section we characterize yet another method, suited to enrich existing knowledge bases. All three directions might be further extensions, which induce further structure based on textual data. The last chapter gives a summary of this work: we demonstrate that without language-dependent knowledge, a computer can learn to extract useful structure from text by using computational semantics. Due to the unsupervised nature of the introduced methods, we are able to extract new structure from raw textual data. This is important especially for languages, for which less manually created resources are available as well as for special domains e.g. medical or finance. We have demonstrated that our methods achieve state-of-the-art performance. Furthermore, we have proven their impact by applying the extracted structure in three natural language processing tasks. We have also applied the methods to different languages and large amounts of data. Thus, we have not proposed methods, which are suited for extracting structure for a single language, but methods that are capable to explore structure for “language” in general.

[1]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[2]  Mark Dredze,et al.  Entity Linking: Finding Extracted Entities in a Knowledge Base , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[3]  Alexander Clark,et al.  An Analysis of Quantitative Aspects in the Evaluation of Thematic Segmentation Algorithms , 2009, SIGDIAL Workshop.

[4]  Bernd Bohnet,et al.  Top Accuracy and Fast Dependency Parsing is not a Contradiction , 2010, COLING.

[5]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[6]  Anders Søgaard,et al.  Semi-supervised condensed nearest neighbor for part-of-speech tagging , 2011, ACL.

[7]  Jaime G. Carbonell,et al.  Monolingual Distributional Profiles for Word Substitution in Machine Translation , 2010, COLING.

[8]  Lucia Specia,et al.  Lexical Generalisation for Word-level Matching in Plagiarism Detection , 2011, RANLP.

[9]  Sampo Pyysalo,et al.  brat: a Web-based Tool for NLP-Assisted Text Annotation , 2012, EACL.

[10]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[11]  Patrick Watrin,et al.  A Finite-State Super-Chunker , 2007, CIAA.

[12]  M. Halliday Towards a Language-Based Theory of Learning , 1993 .

[13]  Dan Klein,et al.  Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.

[14]  Milan Straka,et al.  Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing , 2013, ACL.

[15]  Liping Han,et al.  Distance Weighted Cosine Similarity Measure for Text Classification , 2013, IDEAL.

[16]  Chris Biemann,et al.  TopicTiling: A Text Segmentation Algorithm based on LDA , 2012, ACL 2012.

[17]  Alfio Massimiliano Gliozzo,et al.  Lexical Substitution for the Medical Domain , 2014, EMNLP.

[18]  Larry Gillick,et al.  Text segmentation and topic tracking on broadcast news via a hidden Markov model approach , 1998, ICSLP.

[19]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[20]  Ted Briscoe,et al.  The Second Release of the RASP System , 2006, ACL.

[21]  David Crystal,et al.  A dictionary of linguistics and phonetics , 1997 .

[22]  Elizabeth D. Liddy,et al.  Roget's International Thesaurus: Conceptual Issues and Potential Applications , 1990 .

[23]  Adwait Ratnaparkhi,et al.  A Maximum Entropy Model for Part-Of-Speech Tagging , 1996, EMNLP.

[24]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[25]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[26]  Bertil Malmberg,et al.  New trends in linguistics : an orientation , 1964 .

[27]  Sukhpreet Kaur,et al.  COMPARATIVE ANALYSIS OF C99 AND TOPICTILING TEXT SEGMENTATION ALGORITHMS , 2013 .

[28]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[29]  Joemon M. Jose,et al.  Text segmentation via topic modeling: an analytical study , 2009, CIKM.

[30]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[31]  Christian Biemann,et al.  An automatic approach to identify word sense changes in text media across timescales , 2015, Natural Language Engineering.

[32]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[33]  Udo Hahn,et al.  Effective Grading of Termhood in Biomedical Literature , 2005, AMIA.

[34]  Gregor Heinrich,et al.  Typology of Mixed-Membership Models: Towards a Design Method , 2011, ECML/PKDD.

[35]  Fernando Llopis,et al.  Text Segmentation for Efficient Information Retrieval , 2002, CICLing.

[36]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[37]  Reinhard Rapp,et al.  Mining Text for Word Senses Using Independent Component Analysis , 2004, SDM.

[38]  Maguelonne Teisseire,et al.  Yet Another Ranking Function for Automatic Multiword Term Extraction , 2014, PolTAL.

[39]  Gerda Ruge,et al.  Experiments on Linguistically-Based Term Associations , 1992, Inf. Process. Manag..

[40]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[41]  W. Francis A Standard Corpus of Edited Present-Day American English , 1965 .

[42]  Chris Biemann,et al.  Unsupervised Part-of-Speech Tagging in the Large , 2009 .

[43]  Lynette Hirschman,et al.  Natural language question answering: the view from here , 2001, Natural Language Engineering.

[44]  Christian Biemann,et al.  Turk Bootstrap Word Sense Inventory 2.0: A Large-Scale Resource for Lexical Substitution , 2012, LREC.

[45]  Christian Biemann,et al.  JoBimViz: A Web-based Visualization for Graph-based Distributional Semantic Models , 2015, ACL.

[46]  Huidong Jin,et al.  A segmented topic model based on the two-parameter Poisson-Dirichlet process , 2010, Machine Learning.

[47]  Éric Gaussier,et al.  Towards Automatic Extraction of Monolingual and Bilingual Terminology , 1994, COLING.

[48]  Oren Etzioni,et al.  A Latent Dirichlet Allocation Method for Selectional Preferences , 2010, ACL.

[49]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[50]  Yonatan Bisk,et al.  An HDP Model for Inducing Combinatory Categorial Grammars , 2013, TACL.

[51]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[52]  Lan Du,et al.  Topic Segmentation with a Structured Topic Model , 2013, NAACL.

[53]  Ioannis Korkontzelos,et al.  Unsupervised learning of multiword expressions , 2010 .

[54]  Gordon Wells,et al.  The Meaning Makers: Children Learning Language and Using Language to Learn. First Edition. , 1985 .

[55]  Xuchen Yao,et al.  Nonparametric Bayesian Word Sense Induction , 2011, Graph-based Methods for Natural Language Processing.

[56]  Fabian M. Suchanek,et al.  Yago: A Core of Semantic Knowledge Unifying WordNet and Wikipedia , 2007 .

[57]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing , 2000 .

[58]  Mark Dras,et al.  Is Bad Structure Better Than No Structure?: Unsupervised Parsing for Realisation Ranking , 2012, COLING.

[59]  Carlos Ramisch,et al.  A Broad Evaluation of Techniques for Automatic Acquisition of Multiword Expressions , 2012, ACL 2012.

[60]  Christian Biemann,et al.  A Single Word is not Enough: Ranking Multiword Expressions Using Distributional Semantics , 2015, EMNLP.

[61]  Michael Strube,et al.  Dependency Tree Based Sentence Compression , 2008, INLG.

[62]  G. Vigliocco,et al.  Language as a multimodal phenomenon: implications for language learning, processing and evolution , 2014, Philosophical Transactions of the Royal Society B: Biological Sciences.

[63]  Daumé,et al.  Sketch Techniques for Scaling Distributional Similarity to the Web , 2010 .

[64]  Yaakov Yaari,et al.  Segmentation of Expository Texts by Hierarchical Agglomerative Clustering , 1997, ArXiv.

[65]  Sylvain Lamprier,et al.  On Evaluation Methodologies for Text Segmentation Algorithms , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[66]  Christian Biemann,et al.  Rule-based Dependency Parse Collapsing and Propagation for German and English , 2015, GSCL.

[67]  Jun'ichi Tsujii,et al.  GENIA corpus - a semantically annotated corpus for bio-textmining , 2003, ISMB.

[68]  Simonetta Montemagni,et al.  A Resource and Tool for Super-sense Tagging of Italian Texts , 2010, LREC.

[69]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[70]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[71]  Hal Daumé,et al.  Generating Semantic Orientation Lexicon using Large Data and Thesaurus , 2011, WASSA@ACL.

[72]  Sabine Schulte im Walde,et al.  A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities , 2013, EMNLP.

[73]  Michael C. McCord,et al.  Deep parsing in Watson , 2012, IBM J. Res. Dev..

[74]  Stefan Thater,et al.  Ranking Paraphrases in Context , 2009, TextInfer@ACL.

[75]  Stefan Thater,et al.  What Substitutes Tell Us - Analysis of an “All-Words” Lexical Substitution Corpus , 2014, EACL.

[76]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[77]  James R. Curran,et al.  Improvements in Automatic Thesaurus Extraction , 2002, ACL 2002.

[78]  Christian Biemann,et al.  How Text Segmentation Algorithms Gain from Topic Models , 2012, NAACL.

[79]  Pavel Pecina,et al.  Lexical association measures and collocation extraction , 2009, Lang. Resour. Evaluation.

[80]  Christian Biemann,et al.  That’s sick dude!: Automatic identification of word sense change across different timescales , 2014, ACL.

[81]  Andrea Zielinski,et al.  Using Text Segmentation Algorithms for the Automatic Generation of E-Learning Courses , 2014, *SEM@COLING.

[82]  S. Evert A Lexicographic Evaluation of German Adjective-Noun Collocations , 2008 .

[83]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[84]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[85]  Gene H. Golub,et al.  Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.

[86]  Ido Dagan,et al.  Recognizing Textual Entailment: Models and Applications , 2013, Recognizing Textual Entailment: Models and Applications.

[87]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[88]  Chris Biemann,et al.  Exploiting the Leipzig Corpora Collection , 2006 .

[89]  Christian Biemann,et al.  Combining Supervised and Unsupervised Parsing for Distributional Similarity , 2014, COLING.

[90]  Timothy Baldwin,et al.  Using Distributional Similarity of Multi-way Translations to Predict Multiword Expression Compositionality , 2014, EACL.

[91]  Omer Levy,et al.  Dependency-Based Word Embeddings , 2014, ACL.

[92]  Roy Schwartz,et al.  Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation , 2011, ACL.

[93]  David M. Blei,et al.  Topic segmentation with an aspect hidden Markov model , 2001, SIGIR '01.

[94]  Stefan Evert,et al.  The Statistics of Word Cooccur-rences: Word Pairs and Collocations , 2004 .

[95]  Daniel Jurafsky,et al.  Is Knowledge-Free Induction of Multiword Unit Dictionary Headwords a Solved Problem? , 2001, EMNLP.

[96]  Christian Biemann,et al.  Distributed Distributional Similarities of Google Books over the Centuries , 2014, LREC.

[97]  Carlos Ramisch,et al.  Nothing like Good Old Frequency: Studying Context Filters for Distributional Thesauri , 2014, EMNLP.

[98]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[99]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[100]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[101]  Christian Biemann Structure Discovery in Natural Language , 2012, Theory and Applications of Natural Language Processing.

[102]  Anders Søgaard Unsupervised dependency parsing without training , 2012, Nat. Lang. Eng..

[103]  Timothy Baldwin,et al.  Word Sense Induction for Novel Sense Detection , 2012, EACL.

[104]  Ido Dagan,et al.  A Two Level Model for Context Sensitive Inference Rules , 2013, ACL.

[105]  Dimitra Anastasiou,et al.  Idiom Treatment Experiments in Machine Translation , 2010 .

[106]  Hiroshi Nakagawa,et al.  A Simple but Powerful Automatic Term Extraction Method , 2002, COLING 2002.

[107]  Athanasios Kehagias,et al.  A Dynamic Programming Algorithm for Linear Text Segmentation , 2004, Journal of Intelligent Information Systems.

[108]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[109]  Christian Biemann,et al.  From Global to Local Similarities: A Graph-Based Contextualization Method using Distributional Thesauri , 2013, TextGraphs@EMNLP.

[110]  Carlos Ramisch,et al.  A generic and open framework for multiword expressions treatment: from acquisition to applications. (Un environnement générique et ouvert pour le traitement des expressions polylexicales) , 2012 .

[111]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[112]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[113]  Mark Johnson,et al.  Using Universal Linguistic Knowledge to Guide Grammar Induction , 2010, EMNLP.

[114]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[115]  C. Osgood,et al.  The Measurement of Meaning , 1958 .

[116]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[117]  David Dubin,et al.  The Most Influential Paper Gerard Salton Never Wrote , 2004, Libr. Trends.

[118]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[119]  Mohand Boughanem,et al.  Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation , 2010, CLEF.

[120]  Udo Hahn,et al.  Evaluating the Impact of Alternative Dependency Graph Encodings on Solving Event Extraction Tasks , 2010, EMNLP.

[121]  Rens Bod,et al.  Is the End of Supervised Parsing in Sight? , 2007, ACL.

[122]  Xihong Wu,et al.  Text Segmentation with LDA-Based Fisher Kernel , 2008, ACL.

[123]  Sabine Brants,et al.  The TIGER Treebank , 2001 .

[124]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[125]  Jacob Eisenstein,et al.  Hierarchical Text Segmentation from Multi-Scale Lexical Cohesion , 2009, NAACL.

[126]  Chris Biemann,et al.  Sweeping through the Topic Space: Bad luck? Roll again! , 2012 .

[127]  Johanna D. Moore,et al.  Latent Semantic Analysis for Text Segmentation , 2001, EMNLP.

[128]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[129]  Eneko Agirre,et al.  Personalizing PageRank for Word Sense Disambiguation , 2009, EACL.

[130]  Stephen Clark,et al.  A Systematic Study of Semantic Vector Space Model Parameters , 2014, CVSC@EACL.

[131]  Timothy Baldwin,et al.  Lexical Normalisation of Short Text Messages: Makn Sens a #twitter , 2011, ACL.

[132]  Carlo Strapparava,et al.  FBK-irst: Lexical Substitution Task Exploiting Domain and Syntagmatic Coherence , 2007, SemEval@ACL.

[133]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[134]  Iryna Gurevych,et al.  A broad-coverage collection of portable NLP components for building shareable analysis pipelines , 2014, OIAF4HLT@COLING.

[135]  Roberto Navigli,et al.  The English lexical substitution task , 2009, Lang. Resour. Evaluation.

[136]  Adolf Noreen Vårt språk : nysvensk grammatik i utförlig framställning , 1903 .

[137]  George W. Adamson,et al.  The use of an association measure based on character structure to identify semantically related pairs of words and document titles , 1974, Inf. Storage Retr..

[138]  Christian Hänig,et al.  Modular Classifier Ensemble Architecture for Named Entity Recognition on Low Resource Systems , 2014 .

[139]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[140]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[141]  Dan Klein,et al.  A Generative Constituent-Context Model for Improved Grammar Induction , 2002, ACL.

[142]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[143]  Slava M. Katz,et al.  Technical terminology: some linguistic properties and an algorithm for identification in text , 1995, Natural Language Engineering.

[144]  Gabriella Vigliocco,et al.  Integrating experiential and distributional data to learn semantic representations. , 2009, Psychological review.

[145]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[146]  Christian Biemann,et al.  Text: now in 2D! A framework for lexical expansion with contextual similarity , 2013, J. Lang. Model..

[147]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[148]  Hiroshi Nakagawa,et al.  Automatic term recognition based on statistics of compound nouns and their components , 2003 .

[149]  James Curran,et al.  Ensemble Methods for Automatic Thesaurus Extraction , 2002, EMNLP.

[150]  Xiao Su,et al.  Semantic web infrastructure for fungal enzyme biotechnologists , 2006, J. Web Semant..

[151]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[152]  Weiwei Guo,et al.  Semantic Topic Models: Combining Word Distributional Statistics and Dictionary Definitions , 2011, EMNLP.

[153]  Terry Winograd,et al.  Understanding natural language , 1974 .

[154]  Dekang Lin,et al.  Using Syntactic Dependency as Local Context to Resolve Word Sense Ambiguity , 1997, ACL.

[155]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[156]  Feifei Zhai,et al.  Handling Unknown Words in Statistical Machine Translation from a New Perspective , 2012 .

[157]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[158]  Jun'ichi Tsujii,et al.  Task-oriented Evaluation of Syntactic Parsers and Their Representations , 2008, ACL.

[159]  Sophia Ananiadou,et al.  The C-value/NC-value Method of Automatic Recognition for Multi-Word Terms , 1998, ECDL.

[160]  James Richard Curran,et al.  From distributional to semantic similarity , 2004 .

[161]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[162]  Alexander Clark,et al.  Inducing Syntactic Categories by Context Distribution Clustering , 2000, CoNLL/LLL.

[163]  Ben Taskar,et al.  Sparsity in Dependency Grammar Induction , 2010, ACL.

[164]  Kazuaki Kishida Property of average precision and its generalization: An examination of evaluation indicator for information retrieval experiments , 2005 .

[165]  Yue Zhang,et al.  Feature Embedding for Dependency Parsing , 2014, COLING.

[166]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[167]  Julie Elizabeth Weeds,et al.  Measures and applications of lexical distributional similarity , 2003 .

[168]  Iryna Gurevych,et al.  Using Distributional Similarity for Lexical Expansion in Knowledge-based Word Sense Disambiguation , 2012, COLING.

[169]  Eugenie Giesbrecht,et al.  Automatic Identification of Non-Compositional Multi-Word Expressions using Latent Semantic Analysis , 2006 .

[170]  I. A. Richards,et al.  The Meaning of Meaning: a Study of the Influence of Language upon Thought and of the Science of Symbolism , 1923, Nature.

[171]  Bart Cramer,et al.  Limitations of Current Grammar Induction Algorithms , 2007, ACL.

[172]  George Kingsley Zipf,et al.  Relative Frequency as a Determinant of Phonetic Change , 1930 .

[173]  Pierre Zweigenbaum,et al.  Identifying bilingual Multi-Word Expressions for Statistical Machine Translation , 2012, LREC.

[174]  Sabine Buchholz,et al.  CoNLL-X Shared Task on Multilingual Dependency Parsing , 2006, CoNLL.

[175]  Eric Crestan,et al.  Web-Scale Distributional Similarity and Entity Set Expansion , 2009, EMNLP.

[176]  Richard Johansson,et al.  The Effect of Syntactic Representation on Semantic Role Labeling , 2008, COLING.

[177]  John Blitzer,et al.  Frustratingly Hard Domain Adaptation for Dependency Parsing , 2007, EMNLP.

[178]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[179]  Dan Klein,et al.  Corpus-Based Induction of Syntactic Structure: Models of Dependency and Constituency , 2004, ACL.

[180]  Chong Wang,et al.  Reading Tea Leaves: How Humans Interpret Topic Models , 2009, NIPS.

[181]  Andrew McCallum,et al.  Rethinking LDA: Why Priors Matter , 2009, NIPS.

[182]  Hitoshi Isahara,et al.  A Statistical Model for Domain-Independent Text Segmentation , 2001, ACL.

[183]  Stan Szpakowicz,et al.  The Design and Implementation of an Electronic Lexical Knowledge Base , 2001, Canadian Conference on AI.

[184]  Anna Kazantseva,et al.  Topical Segmentation: a Study of Human Performance and a New Measure of Quality , 2012, HLT-NAACL.

[185]  Patrick Pantel,et al.  Discovering word senses from text , 2002, KDD.

[186]  Yoav Seginer,et al.  Fast Unsupervised Incremental Parsing , 2007, ACL.

[187]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[188]  Hiroshi Nakagawa,et al.  Topic models with power-law using Pitman-Yor process , 2010, KDD.

[189]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[190]  Felix Hill,et al.  SimLex-999: Evaluating Semantic Models With (Genuine) Similarity Estimation , 2014, CL.

[191]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[192]  Iryna Gurevych,et al.  UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures , 2012, *SEMEVAL.

[193]  Jeroen Geertzen,et al.  Dependency Parsing by Inference over High-recall Dependency Predictions , 2006, CoNLL.

[194]  James R. Curran,et al.  Scaling Distributional Similarity to Large Corpora , 2006, ACL.

[195]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[196]  Mark Johnson,et al.  Improving Unsupervised Dependency Parsing with Richer Contexts and Smoothing , 2009, NAACL.

[197]  Christopher D. Manning,et al.  Generating Typed Dependency Parses from Phrase Structure Parses , 2006, LREC.

[198]  Christian Biemann,et al.  NoSta-D Named Entity Annotation for German: Guidelines and Dataset , 2014, LREC.

[199]  Phil Blunsom,et al.  Multilingual Models for Compositional Distributed Semantics , 2014, ACL.

[200]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[201]  Donald Hindle,et al.  Noun Classification From Predicate-Argument Structures , 1990, ACL.

[202]  David J. Weir,et al.  Characterising Measures of Lexical Distributional Similarity , 2004, COLING.

[203]  Christian Biemann,et al.  GermaNER: Free Open German Named Entity Recognition Tool , 2015, GSCL.

[204]  Yoav Goldberg,et al.  A Dataset of Syntactic-Ngrams over Time from a Very Large Corpus of English Books , 2013, *SEMEVAL.

[205]  Stefan Bordag,et al.  A Comparison of Co-occurrence and Similarity Measures as Simulations of Context , 2008, CICLing.

[206]  Christian Biemann,et al.  Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems , 2006 .