Finding Semantic Similarity in Raw Text: the Deese Antonyms

As more and more text becomes readily available in electronic form, much interest is being generated by finding ways of automatically extracting information from subsets of this text. While manual indexing and automatic keyword indexing are well known, both have drawbacks. Recent research on robust syntactic analysis and statistical correlations promises that some of the intuitive advantages of manual indexing can be retained in a fully automatic system. Here I present an experiment performed with my system SEXTANT which extracts semantically similar words from raw text. Using statistical methods combined with robust syntactic analysis, SEXTANT was able to find many of the intuitive pairings between semantically similar words studied by Deese [Deese, 1954].