Dynamic Synonym Candidates Extraction for Searching Documents in a Corpus

This paper proposes a method for implementing real-time synonym search systems. Our final aim is to provide users with an interface with which they can query the system for any length strings and the system returns a list of synonyms of the input string. We propose an efficient algorithm for this operation. The strategy involves indexing documents by suffix arrays and finding adjacent strings of the query by dynamically retrieving its contexts (i.e., strings around the query). The extracted contexts are in turn sent to the suffix arrays to retrieve the strings around the contexts, which are likely to contain the synonyms of the query string.

[1]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[2]  Yasuhiro Ogawa,et al.  Selection of Effective Contextual Information for Automatic Synonym Acquisition , 2006, ACL.

[3]  Yasuhiro Ogawa,et al.  Bootstrapping-Based Extraction of Dictionary Terms from Unsegmented Legal Text , 2008, JSAI.

[4]  Tetsuya Nasukawa,et al.  Term Aggregation: Mining Synonymous Expressions using Personal Stylistic Variations , 2004, COLING.

[5]  Jörg Tiedemann,et al.  Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity , 2006, ACL.

[6]  James R. Curran,et al.  Scaling Distributional Similarity to Large Corpora , 2006, ACL.

[7]  Ming Zhou,et al.  Optimizing Synonym Extraction Using Monolingual and Bilingual Resources , 2003, IWP@ACL.

[8]  Caroline Gasperin,et al.  Using Syntactic Contexts for Measuring Word Similarity , 2007 .

[9]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[10]  Marius Pasca,et al.  Aligning Needles in a Haystack: Paraphrase Acquisition Across the Web , 2005, IJCNLP.

[11]  Soumen Chakrabarti,et al.  Mining the web - discovering knowledge from hypertext data , 2002 .

[12]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[13]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[14]  Eiichiro Sumita,et al.  Acquiring Synonyms from Monolingual Comparable Texts , 2005, IJCNLP.

[15]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[16]  Kazuhide Yamamoto Acquisition of Lexical Paraphrases from Texts , 2002, COLING 2002.