Using Clustering Approaches to Open-Domain Question Answering

This paper presents two novel clustering approaches and their application to open-domain question answering. The One-Sentence-Multi-Topicclustering approach is first presented, which clusters sentences to improve the language model for retrieving sentences. Second, regarding each cluster in the results for One-Sentence-Multi-Topicclustering as aligned sentences, we present a pattern-similarity-based clustering approach that automatically learns syntactic answer patterns to answer selection through verticaland horizontal clustering. Our experiments on Chinese question answering demonstrates that One-Sentence-Multi-Topicclustering is much better than K-Means and is comparable to PLSI when used in sentence clustering of question answering. Similarly, the pattern-similarity-based clustering also proved to be efficient in learning syntactic answer patterns, the absolute improvement in syntactic pattern-based answer extraction over retrieval-based answer extraction is about 9%.