Using context-window overlapping in synonym discovery and ontology extension

This paper describes a new, unsupervised procedure called Context-window overlapping for calculating the semantic distance between two terms. It is based on the distributional semantics hypothesis, and, in particular, in the fact that synonym words should be interchangeable in every context, and hyponyms can be substituted by their hyperonyms in most contexts. The procedure has been applied to synonym identification, and to ontology extension. In the first task, it has been evaluated with 80 synonym test questions from the TOEFL which already constitute a standard test set in this problem, and attains results similar to most other non-ensemble procedures. Interestingly, it clearly outperforms Latent Semantic Analysis, other procedure grounded on the Distributional Semantic hypothesis. Concerning ontology enrichment, the results obtained are promising, although they can still be much improved. Conclusions are drawn from this result, and we outline several possibilities for future work.

[1]  Edmond Chow,et al.  New Experiments in Distributional Representations of Synonymy , 2005, CoNLL.

[2]  James R. Curran,et al.  Supersense Tagging of Unknown Nouns Using Semantic Similarity , 2005, ACL.

[3]  Maria Ruiz-Casado,et al.  Automatic Extraction of Semantic Relationships for WordNet by Means of Pattern Learning from Wikipedia , 2005, NLDB.

[4]  Paola Velardi,et al.  Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites , 2004, CL.

[5]  Steffen Staab,et al.  Clustering Concept Hierarchies from Text , 2004, LREC.

[6]  Yorick Wilks,et al.  Providing machine tractable dictionary tools , 1990, Machine Translation.

[7]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[8]  Stan Szpakowicz,et al.  Roget's thesaurus and semantic similarity , 2012, RANLP.

[9]  Jeffrey P. Bigham,et al.  Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems , 2003, ArXiv.

[10]  Ming Zhou,et al.  Identifying Synonyms among Distributionally Similar Words , 2003, IJCAI.

[11]  Charles L. A. Clarke,et al.  Frequency Estimates for Statistical Word Similarity Measures , 2003, NAACL.

[12]  Steffen Staab,et al.  Word classification based on combined measures of distributional and semantic similarity , 2003, EACL.

[13]  Ted Pedersen,et al.  Using Measures of Semantic Relatedness for Word Sense Disambiguation , 2003, CICLing.

[14]  Suresh Manandhar,et al.  Extending a Lexical Ontology by a Combination of Distributional Semantics Signatures , 2002, EKAW.

[15]  Graeme Hirst,et al.  Near-Synonymy and Lexical Choice , 2002, CL.

[16]  Olatz Ansa,et al.  Enriching WordNet concepts with topic signatures , 2001, ArXiv.

[17]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[18]  Olatz Ansa,et al.  Enriching very large ontologies using the WWW , 2000, ECAI Workshop on Ontology Learning.

[19]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[20]  Eugene Charniak,et al.  Finding Parts in Very Large Corpora , 1999, ACL.

[21]  Emmanuel Morin,et al.  Extracting Semantic Relationships between Terms: Supervised vs. Unsupervised Methods , 1999 .

[22]  Lucy Vanderwende,et al.  MindNet: Acquiring and Structuring Semantic Information from Text , 1998, COLING-ACL.

[23]  Udo Hahn,et al.  Towards Text Knowledge Engineering , 1998, AAAI/IAAI.

[24]  German Rigau Automatic Acquisition of Lexical Knowl-edge from MRDs , 1998 .

[25]  David Faure,et al.  A corpus-based conceptual clustering method for verb frames and ontology , 1998 .

[26]  Lillian Lee,et al.  Similarity-Based Approaches to Natural Language Processing , 1997, ArXiv.

[27]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[28]  Chin-Yew Lin,et al.  Robust automated topic identification , 1997 .

[29]  Peter M. Hastings Automatic acquisition of word meaning from context , 1994 .

[30]  P. Resnik Selection and information: a class-based approach to lexical relationships , 1993 .

[31]  Thomas R. Gruber,et al.  A Translation Approach to Portable Ontologies , 1993 .

[32]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[33]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[34]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[35]  J. Firth,et al.  Papers in linguistics, 1934-1951 , 1957 .