Semantic Similarity between Queries in QA System using a Domain-specific Taxonomy

Semantic similarity has been extensively studied in the past decades and has become a rapidly growing field of research. Sentence or short text similarity measures play an important role in text-based applications, such as text mining, information retrieval and question answering systems. In this paper we consider the problem of semantic similarity between queries in a question answering system with the purpose of query recommendation. Our approach is based on an existing domain-specific taxonomy. We define innovative three-layered semantic similarity measures between queries using existing similarity measures between ontology concepts combined with various set-based distance measures. We then analyse and evaluate our approach against human intuition using a data set of 90 questions. Further on, we argue that these measures are taxonomy-dependent and are influenced by various factors: taxonomy structure, keyword mappings, keyword weights, query-keyword mappings and the chosen concept similarity measure.

[1]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[2]  Martha Palmer,et al.  Verb Semantics and Lexical Selection , 1994, ACL.

[3]  Peter W. Foltz,et al.  Learning Human-like Knowledge by Singular Value Decomposition: A Progress Report , 1997, NIPS.

[4]  Mark Klein,et al.  How Similar Is It? Towards Personalized Similarity Measures in Ontologies , 2005, Wirtschaftsinformatik.

[5]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[6]  Frank van Harmelen,et al.  Peer Selection in Peer-to-Peer Networks with Semantic Topologies , 2004, ICSNW.

[7]  Christiane Fellbaum,et al.  Combining Local Context and Wordnet Similarity for Word Sense Identification , 1998 .

[8]  Viviana Mascardi,et al.  An Ontology-Based Similarity between Sets of Concepts , 2005, WOA.

[9]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[10]  M. Dolores del Castillo,et al.  SyMSS: A syntax-based measure for short-text semantic similarity , 2011, Data Knowl. Eng..

[11]  David McLean,et al.  An Approach for Measuring Semantic Similarity between Words Using Multiple Information Sources , 2003, IEEE Trans. Knowl. Data Eng..

[12]  Paolo Bouquet,et al.  Asking and answering semantic queries , 2004 .

[13]  Peter D. Turney Learning Algorithms for Keyphrase Extraction , 2000, Information Retrieval.

[14]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[15]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[16]  Heikki Mannila,et al.  Distance measures for point sets and their computation , 1997, Acta Informatica.

[17]  Curt Burgess,et al.  Explorations in context space: Words, sentences, discourse , 1998 .

[18]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[19]  Mark A. Musen,et al.  Comparison of Ontology-based Semantic-Similarity Measures , 2008, AMIA.

[20]  Peter W. Foltz,et al.  An introduction to latent semantic analysis , 1998 .

[21]  Zuhair Bandar,et al.  Benchmarking short text semantic similarity , 2010, Int. J. Intell. Inf. Database Syst..

[22]  Yaakov HaCohen-Kerner,et al.  Automatic Extraction and Learning of Keyphrases from Scientific Articles , 2005, CICLing.

[23]  Patrick Marcel,et al.  A survey of query recommendation techniques for data warehouse exploration , 2011, EDA.

[24]  Wang Pu,et al.  Ontology-Based Measure of Semantic Similarity between Concepts , 2009, 2009 WRI World Congress on Software Engineering.

[25]  Myoung-Ho Kim,et al.  Information Retrieval Based on Conceptual Distance in is-a Hierarchies , 1993, J. Documentation.

[26]  Elizabeth Chang,et al.  A Hybrid Concept Similarity Measure Model for Ontology Environment , 2009, OTM Workshops.

[27]  Hoa A. Nguyen,et al.  A Cluster-Based Approach for Semantic Similarity in the Biomedical Domain , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[28]  Ya-Dong Wang,et al.  An Ontology-Based Method for Similarity Calculation of Concepts in the Semantic Web , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[29]  Roy Rada,et al.  Development and application of a metric on semantic nets , 1989, IEEE Trans. Syst. Man Cybern..

[30]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.