Semantic computation in geography question answering

In this paper, we develop a question answering system for solving single-option geography questions. The system is built in two directions. One computes semantic similarity between two questions. The other converts the task into question sentence binary-classification by generating the distributed representation of sentence semantic. When computing semantic similarity, we first implement a basic framework based on bag-of-words (BOW), and then extend the framework to Edit Distance variant and BM25 variant. On the other hand, we use convolutional neural network and stacked denoising auto-encoder to generate the distributed representation of sentence semantic respectively. Given the semantic representation of sentence, a logistic regression classifier is employed to classify the sentence. The dataset we use is a large scale Chinese college entrance examination question set of geography, which is clawed from the internet. Experiment results show that the performance of CNN can answer the single-option geography questions with high accuracy, which can achieve 0.7310.

[1]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[2]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[3]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[4]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[6]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[7]  Xiaolong Wang,et al.  Answer Sequence Learning with Neural Networks for Answer Selection in Community Question Answering , 2015, ACL.

[8]  Stephen E. Robertson,et al.  Okapi/Keenbow at TREC-8 , 1999, TREC.

[9]  Jennifer Chu-Carroll,et al.  Building Watson: An Overview of the DeepQA Project , 2010, AI Mag..

[10]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[11]  Preslav Nakov,et al.  QCRI: Answer Selection for Community Question Answering - Experiments for Arabic and English , 2015, *SEMEVAL.

[12]  Phil Blunsom,et al.  Neural Variational Inference for Text Processing , 2015, ICML.

[13]  Juan-Zi Li,et al.  Keyword Extraction Using Support Vector Machine , 2006, WAIM.

[14]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[15]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[16]  Tat-Seng Chua,et al.  Question answering passage retrieval using dependency relations , 2005, SIGIR '05.

[17]  Kai Wang,et al.  A syntactic tree matching approach to finding similar questions in community-based qa services , 2009, SIGIR.

[18]  Qun Liu,et al.  Encoding Source Language with Convolutional Neural Network for Machine Translation , 2015, ACL.

[19]  Xuanjing Huang,et al.  FudanNLP: A Toolkit for Chinese Natural Language Processing , 2013, ACL.

[20]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.