Semantic relations in bilingual lexicons

Bilingual lexicons, essential to many NLP applications, can be constructed automatically on the basis of parallel or comparable corpora. In this article, we make two contributions to their induction from comparable corpora. The first one concerns the creation of these lexicons. We show that seed lexicons can be improved by adding a bootstrapping procedure that uses cross-lingual distributional similarity. The second contribution concerns the evaluation of bilingual lexicons. It is generally based on translation lexicons, which corresponds to the implicit assumption that (cross-lingual) synonymy is the semantic relation of primary interest, even though other semantic relations like (cross-lingual) hyponymy or cohyponymy make up a considerable portion of translation pair candidates proposed by distributional methods. We argue that the focus on synonymy is an oversimplification and that many applications can profit from the inclusion of other semantic relations. We study what effect these semantic relations have on two cross-lingual tasks: the cross-lingual projection of polarity scores and the cross-lingual modeling of selectional preferences. We find that the presence of non-synonymous semantic relations may negatively affect the former of these tasks, but benefit the latter.

[1]  Philip Resnik,et al.  Bootstrapping parsers via syntactic projection across parallel texts , 2005, Natural Language Engineering.

[2]  Yves Peirsman,et al.  Cross-lingual Induction of Selectional Preferences with Bilingual Vector Spaces , 2010, NAACL.

[3]  Susan T. Dumais,et al.  Automatic cross-linguistic information retrieval using latent semantic indexing , 2007 .

[4]  Dekang Lin,et al.  Principle-Based Parsing Without Overgeneration , 1993, ACL.

[5]  Stefan Schulz,et al.  Bootstrapping dictionaries for cross-language information retrieval , 2005, SIGIR '05.

[6]  Stephen Clark,et al.  Class-Based Probability Estimation Using a Semantic Hierarchy , 2002, CL.

[7]  Ido Dagan,et al.  Directional Distributional Similarity for Lexical Expansion , 2009, ACL/IJCNLP.

[8]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[9]  P. Resnik Selectional constraints: an information-theoretic model and its computational realization , 1996, Cognition.

[10]  Mirella Lapata,et al.  Evaluating and Combining Approaches to Selectional Preference Acquisition , 2003, EACL.

[11]  Kalervo Järvelin,et al.  Proceedings of Sheffield SIGIR, 2004, July 25th-29th : the Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in information Retrieval , 2004 .

[12]  Janyce Wiebe,et al.  Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis , 2005, HLT.

[13]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[14]  Eneko Agirre,et al.  A Pilot Study of English Selectional Preferences and Their Cross-Lingual Compatibility with Basque , 2003, TSD.

[15]  S. Riezler,et al.  Statistical Grammar Models and Lexicon Acquisition 12.1 Introduction , 2001 .

[16]  Oren Etzioni,et al.  Compiling a Massive, Multilingual Dictionary via Probabilistic Inference , 2009, ACL.

[17]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[18]  Daniel Gildea,et al.  Automatic Labeling of Semantic Roles , 2000, ACL.

[19]  Philip Resnik,et al.  An Unsupervised Method for Word Sense Tagging using Parallel Corpora , 2002, ACL.

[20]  M. Tanenhaus,et al.  Modeling the Influence of Thematic Fit (and Other Constraints) in On-line Sentence Comprehension , 1998 .

[21]  Ulli Waltinger,et al.  GermanPolarityClues: A Lexical Resource for German Sentiment Analysis , 2010, LREC.

[22]  Daniel Sanders,et al.  Langenscheidts enzyklopädisches Wörterbuch der englischen und deutschen Sprache = Langenscheidt's encyclopaedic dictionary of the English and German languages , 1962 .

[23]  David Yarowsky,et al.  Improving Translation Lexicon Induction from Monolingual Corpora via Dependency Contexts and Part-of-Speech Equivalences , 2009, CoNLL.

[24]  Hinrich Schütze,et al.  Sentiment Translation through Multi-Edge Graphs , 2010, COLING.

[25]  Reinhard Rapp,et al.  Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[26]  Sanda M. Harabagiu,et al.  FALCON: Boosting Knowledge for Answer Engines , 2000, TREC.

[27]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[28]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[29]  Willy Martin,et al.  Van Dale groot woordenboek Engels-Nederlands , 1989 .

[30]  Yves Peirsman,et al.  Size matters: tight and loose context definitions in English word space models , 2008 .

[31]  Reinhard Rapp,et al.  Identifying Word Translations in Non-Parallel Texts , 1995, ACL.

[32]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.

[33]  Daniel Jurafsky,et al.  Semantic Taxonomy Induction from Heterogenous Evidence , 2006, ACL.

[34]  Katrin Erk,et al.  A Flexible, Corpus-Driven Model of Regular and Inverse Selectional Preferences , 2010, CL.

[35]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[36]  PeirsmanYves,et al.  Semantic relations in bilingual lexicons , 2008 .

[37]  Patrick Pantel,et al.  Discovery of inference rules for question-answering , 2001, Natural Language Engineering.

[38]  T. Fry Size matters. , 2007, Community practitioner : the journal of the Community Practitioners' & Health Visitors' Association.

[39]  Adam Kilgarriff,et al.  What’s in a Thesaurus? , 2000, LREC.

[40]  Peter D. Turney Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews , 2002, ACL.

[41]  Ellen Riloff,et al.  A corpus-based bootstrapping algorithm for Semi-Automated semantic lexicon construction , 1999, Natural Language Engineering.

[42]  W. Lowe,et al.  Towards a Theory of Semantic Space , 2001 .

[43]  Soo-Min Kim,et al.  Determining the Sentiment of Opinions , 2004, COLING.

[44]  David Yarowsky,et al.  Inducing Multilingual POS Taggers and NP Bracketers via Robust Projection Across Aligned Corpora , 2001, NAACL.

[45]  Alessandro Lenci,et al.  Distributional Memory: A General Framework for Corpus-Based Semantics , 2010, CL.

[46]  Marie-Francine Moens,et al.  Cross-language linking of news stories on the web using interlingual topic modelling , 2009, CIKM-SWSM.

[47]  Pascale Fung,et al.  An IR Approach for Translating New Words from Nonparallel, Comparable Texts , 1998, ACL.

[48]  Rada Mihalcea,et al.  Learning Multilingual Subjective Language via Cross-Lingual Projections , 2007, ACL.

[49]  Ido Dagan,et al.  The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[50]  Philip Resnik,et al.  Cross-Language Parser Adaptation between Related Languages , 2008, IJCNLP.

[51]  Graeme Hirst,et al.  Cross-Lingual Distributional Profiles of Concepts for Measuring Semantic Distance , 2007, EMNLP.

[52]  Pablo Gamallo Evaluating Two Different Methods for the Task of Extracting Bilingual Lexicons from Comparable Corpora , 2008 .

[53]  Vibhu O. Mittal,et al.  Bridging the lexical chasm: statistical approaches to answer-finding , 2000, SIGIR '00.

[54]  Hai Zhao,et al.  Cross Language Dependency Parsing using a Bilingual Lexicon , 2009, ACL.

[55]  Suzanne Stevenson,et al.  A Multilingual Paradigm for Automatic Verb Classification , 2002, ACL.

[56]  Nathanael Chambers,et al.  Improving the Use of Pseudo-Words for Evaluating Selectional Preferences , 2010, ACL.

[57]  P. Smith,et al.  A review of ontology based query expansion , 2007, Inf. Process. Manag..

[58]  Rada Mihalcea,et al.  A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources , 2008, LREC.

[59]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[60]  Magnus Sahlgren,et al.  Creating bilingual lexica using reference wordlists for alignment of monolingual semantic vector spaces , 2005, NODALIDA.

[61]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[62]  Mirella Lapata,et al.  Dependency-Based Construction of Semantic Space Models , 2007, CL.

[63]  Hang Li,et al.  Learning Word Association Norms Using Tree Cut Pair Models , 1996, ICML.

[64]  W.J.R. Martin,et al.  Van Dale Groot woordenboek Nederlands-Engels , 1998 .

[65]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[66]  Masatoshi Yoshikawa,et al.  Learning bilingual translations from comparable corpora to cross-language information retrieval: hybrid statistics-based and linguistics-based approach , 2003, IRAL.

[67]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Schemas and their Participants , 2009, ACL.

[68]  Ido Dagan,et al.  Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[69]  Pascale Fung,et al.  Finding Terminology Translations from Non-parallel Corpora , 1997, VLC.

[70]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[71]  Peter D. Turney,et al.  Emotions Evoked by Common Words and Phrases: Using Mechanical Turk to Create an Emotion Lexicon , 2010, HLT-NAACL 2010.

[72]  Yves Peirsman,et al.  Predicting Strong Associations on the Basis of Corpus Data , 2009, EACL.

[73]  Diana McCarthy,et al.  Disambiguating Nouns, Verbs, and Adjectives Using Automatically Acquired Selectional Preferences , 2003, CL.

[74]  Pierre Zweigenbaum,et al.  Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora , 2002, COLING.

[75]  Gerhard Heyer,et al.  SentiWS - A Publicly Available German-language Resource for Sentiment Analysis , 2010, LREC.