A new question analysis approach for community question answering system

A new question analysis approach is presented for Chinese community question and answering system (CQA), which includes two subtasks: multi question type identification and question information analyzing. For the first subtask, we assume that a question can belong to several question types not a specific one according to its information needs. For the second subtask, a Question Information Chunk Annotation (QICA) method is presented which classifies question information into five types according to their semantic role. A data set with 22000 questions is built and 12000 of which is used as training data, other 10000 as test data. SVM is used for the first subtask and achieve an average F-score of 86.3%. M3Ns (Max-Margin Markov Networks) models is used for the second subtask. The M3Ns yields an F-score of 86.86% which is better than those results of three other models (ME, MEMM and CRF). Furthermore, to test and verify the new question analysis approach, an experiment for question paraphrase recognition is taken and better performance is achieved when the question analysis result is used. This research will contribute to and stimulate other research in the field of QA.