Exploring patterns in dictionary definitions for synonym extraction

Automatic determination of synonyms and/or semantically related words has various applications in Natural Language Processing. Two mainstream paradigms to date, lexicon-based and distributional approaches, both exhibit pros and cons with regard to coverage, complexity, and quality. In this paper, we propose three novel methods—two rule-based methods and one machine learning approach—to identify synonyms from definition texts in a machine-readable dictionary. Extracted synonyms are evaluated in two extrinsic experiments and one intrinsic experiment. Evaluation results show that our pattern-based approach achieves best performance in one of the experiments and satisfactory results in the other, comparable to corpus-based state-of-the-art results.

[1]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[2]  Graeme Hirst,et al.  Distributional measures of concept-distance: A task-oriented evaluation , 2006, EMNLP.

[3]  Takenobu Tokunaga,et al.  Combining multiple evidence from different types of thesaurus for query expansion , 1999, SIGIR '99.

[4]  Richard Reichert,et al.  TWO DICTIONARY TRANSCRIPTS AND PROGRAMS FOR PROCESSING THEM. VOLUME I. THE ENCODING SCHEME, PARSENT AND CONIX. , 1969 .

[5]  Della Summers,et al.  Longman Dictionary of Contemporary English , 1995 .

[6]  Roberto Navigli,et al.  Using Cycles and Quasi-Cycles to Disambiguate Dictionary Glosses , 2009, EACL.

[7]  Robert Alfred Amsler The Structure of the Merriam-Webster Pocket Dictionary , 1980 .

[8]  Yorick Wilks,et al.  Subject-Dependent Co-Occurence and Word Sense Disambiguation , 1991, ACL.

[9]  Hwee Tou Ng,et al.  Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing , 2008, Conference on Empirical Methods in Natural Language Processing.

[10]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[11]  E. Briscoe,et al.  Introduction to computational lexicography for natural language processing , 1989 .

[12]  Edmond Chow,et al.  New Experiments in Distributional Representations of Synonymy , 2005, CoNLL.

[13]  Cédrick Fairon,et al.  Lexical Similarity Based On Quantity Of Information Exchanged - Synonym Extraction , 2004, RIVF.

[14]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[15]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[16]  Stan Szpakowicz,et al.  Roget's thesaurus and semantic similarity , 2012, RANLP.

[17]  Thomas L. Griffiths,et al.  Contextual Dependencies in Unsupervised Word Segmentation , 2006, ACL.

[18]  Masato Hagiwara,et al.  A Supervised Learning Approach to Automatic Synonym Identification Based on Distributional Features , 2008, ACL.

[19]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[20]  Eiichiro Sumita,et al.  Automatic paraphrasing based on parallel corpus for normalization , 2002, LREC.

[21]  Colin Yallop,et al.  The Macquarie Dictionary , 1996, English Today.

[22]  Graeme Hirst,et al.  Computing Word-Pair Antonymy , 2008, EMNLP.

[23]  Jeffrey P. Bigham,et al.  Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems , 2003, ArXiv.

[24]  M. González Rodríguez,et al.  Proceedings of the third International Conference on Language Resources and Evaluation , 2002 .

[25]  Roberto Navigli,et al.  The English lexical substitution task , 2009, Lang. Resour. Evaluation.

[26]  Martin Chodorow,et al.  Extracting Semantic Hierarchies from a Large On-Line Dictionary , 1985, ACL.

[27]  Ming Zhou,et al.  Identifying Synonyms among Distributionally Similar Words , 2003, IJCAI.

[28]  Merriam Webster Merriam-Webster's Collegiate Dictionary , 2016 .

[29]  Hiyan Alshawi,et al.  Processing Dictionary Definitions with Phrasal Pattern Hierarchies , 1987, CL.

[30]  Ming Zhou,et al.  Optimizing Synonym Extraction Using Monolingual and Bilingual Resources , 2003, IWP@ACL.

[31]  Vincent D. Blondel,et al.  Automatic extraction of synonyms in a dictionary , 2002 .

[32]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[33]  Regina Barzilay,et al.  Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[34]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[35]  Bruno Gaume,et al.  Synonym Extraction Using a Semantic Distance on a Dictionary , 2006 .

[36]  Geoff Barnbrook,et al.  Briefly noted: defining language: A local grammar of definition sentences , 2002 .

[37]  Yorick Wilks,et al.  Is there content in empty heads? , 1990, COLING.

[38]  Vittorio Castelli,et al.  Event Matching Using the Transitive Closure of Dependency Relations , 2008, ACL.

[39]  George W. Davidson,et al.  Roget's Thesaurus of English Words and Phrases , 1982 .

[40]  Jörg Tiedemann,et al.  Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity , 2006, ACL.

[41]  James Curran,et al.  Ensemble Methods for Automatic Thesaurus Extraction , 2002, EMNLP.

[42]  Carl Vogel,et al.  Proceedings of the 16th International Conference on Computational Linguistics , 1996, COLING 1996.

[43]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[44]  Virginia DeBuys,et al.  Proceedings of the 5th Annual International Conference on Systems Documentation, SIGDOC 1986, Toronto, Ontario, Canada, 1986 , 1986, ACM International Conference on Design of Communication.

[45]  Jean Carletta Modelling Variations in Goal-Directed Dialogue , 1990, COLING.

[46]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.