Building a Scalable Database-Driven Reverse Dictionary

In this paper, we describe the design and implementation of a reverse dictionary. Unlike a traditional forward dictionary, which maps from words to their definitions, a reverse dictionary takes a user input phrase describing the desired concept, and returns a set of candidate words that satisfy the input phrase. This work has significant application not only for the general public, particularly those who work closely with words, but also in the general field of conceptual search. We present a set of algorithms and the results of a set of experiments showing the retrieval accuracy of our methods and the runtime response time performance of our implementation. Our experimental results show that our approach can provide significant improvements in performance scale without sacrificing the quality of the result. Our experiments comparing the quality of our approach to that of currently available reverse dictionaries show that of our approach can provide significantly higher quality over either of the other currently available implementations.

[1]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[2]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[3]  Jong Wook Kim,et al.  CP/CV: concept similarity mining without frequency information from domain describing taxonomies , 2006, CIKM '06.

[4]  Ola Knutsson,et al.  Improving Precision in Information Retrieval for Swedish using Stemming , 2001, NODALIDA.

[5]  Ramesh Nallapati,et al.  Parallelized Variational EM for Latent Dirichlet Allocation: An Experimental Evaluation of Speed and Scalability , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[6]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[7]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[8]  W. Bruce Croft,et al.  Passage retrieval based on language models , 2002, CIKM '02.

[9]  Ian H. Witten,et al.  Managing Gigabytes: Compressing and Indexing Documents and Images , 1999 .

[10]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[11]  Hans-Peter Kriegel,et al.  Using extended feature objects for partial similarity retrieval , 1997, The VLDB Journal.

[12]  Tat-Seng Chua,et al.  Question answering passage retrieval using dependency relations , 2005, SIGIR '05.

[13]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Rami Zwick,et al.  Measures of similarity among fuzzy concepts: A comparative analysis , 1987, Int. J. Approx. Reason..

[16]  Ricardo da Silva Torres,et al.  Diagnosing Similarity of Oscillation Trends in Time Series , 2007 .

[17]  Enrico Blanzieri,et al.  Fast Local Support Vector Machines for Large Datasets , 2009, MLDM.

[18]  Martti Juhola,et al.  On principal component analysis, cosine and Euclidean measures in information retrieval , 2007, Inf. Sci..

[19]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[20]  Stephen E. Robertson,et al.  GatfordCentre for Interactive Systems ResearchDepartment of Information , 1996 .

[21]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[22]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.

[23]  James Allan,et al.  Approaches to passage retrieval in full text information systems , 1993, SIGIR.

[24]  Eleazar Eskin,et al.  Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning , 1999, EMNLP.

[25]  Evgeniy Gabrilovich,et al.  Wikipedia-based Semantic Interpretation for Natural Language Processing , 2014, J. Artif. Intell. Res..

[26]  Giles,et al.  Searching the world wide Web , 1998, Science.