A retrieval model for question in community question answering system

Studies of Community-based question and answer services (cQA) have grown to be one of the emerging trends in Web information services. And one of the main tasks is to retrieve similar questions. To identify the fact that some questions with different expressions though may indeed have the same, or very similar, meaning, the similar questions are defined in syntactic, semantic and pragmatic aspects according to user retrieval intention. Five models including Language Model, Translation-based Language Model, Parser-based model, LDA and WordNet source-based model are selected as baselines. Integrated models which linearly combine WordNet, Stanford Parser and Language Model and further weighted by syntactic feature are proposed to integrate features of the three aspects. Experiment results show that our integrated models perform better than the basic models, especially when the linear combination of the WordNet model and the Stanford Parser model is weighted by syntactic information with proper noun phrases as representative. The integrated models are further verified by logistic analysis.