A Semantic Similarity Computing Model based on Siamese Network for Duplicate Questions Identification

Traditional semantic similarity computing methods mostly regard the text as a set of words, by calculating the number of words occurred in the text to build the feature vector, then using the metrics such as cosine distance between the vectors to calculate the text similarity. However, these methods only consider the word level of the sentence, not the semantic level, which may ignore many important information, including syntax and word order. This paper proposes a new deep learning method, which combines the attention mechanism with BiLSTM based on Siamese network to achieve the semantic similarity matching for given question pairs. Experimental results show that our models can make full use of the semantic information of the text, and the F1 value in the dataset provided by the CCKS2018 question-intention matching task is 0.84586, achieving fourth place in the final test.