Utilizing Bidirectional Encoder Representations from Transformers for Answer Selection

Pre-training a transformer-based model for the language modeling task in a large dataset and then fine-tuning it for downstream tasks has been found very useful in recent years. One major advantage of such pre-trained language models is that they can effectively absorb the context of each word in a sentence. However, for tasks such as the answer selection task, the pre-trained language models have not been extensively used yet. To investigate their effectiveness in such tasks, in this paper, we adopt the pre-trained Bidirectional Encoder Representations from Transformer (BERT) language model and fine-tune it on two Question Answering (QA) datasets and three Community Question Answering (CQA) datasets for the answer selection task. We find that fine-tuning the BERT model for the answer selection task is very effective and observe a maximum improvement of 13.1% in the QA datasets and 18.7% in the CQA datasets compared to the previous state-of-the-art.

[1]  Shafiq R. Joty,et al.  Zero-Resource Cross-Lingual Named Entity Recognition , 2019, AAAI.

[2]  Qinmin Hu,et al.  CA-RNN: Using Context-Aligned Recurrent Neural Networks for Modeling Sentence Similarity , 2018, AAAI.

[3]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[4]  Xiaohui Yu,et al.  Modeling and Predicting the Helpfulness of Online Reviews , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[5]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[6]  Qinmin Hu,et al.  Enhancing Recurrent Neural Networks with Positional Attention for Question Answering , 2017, SIGIR.

[7]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Preslav Nakov,et al.  SemEval-2017 Task 3: Community Question Answering , 2017, *SEMEVAL.

[9]  Maneesh Agrawala,et al.  Answering Questions about Charts and Generating Visual Explanations , 2020, CHI.

[10]  Dan Roth,et al.  Robust Named Entity Recognition with Truecasing Pretraining , 2020, AAAI.

[11]  Xiangji Huang,et al.  Proximity-based rocchio's model for pseudo relevance , 2012, SIGIR '12.

[12]  Stephen E. Robertson,et al.  Applying Machine Learning to Text Segmentation for Information Retrieval , 2004, Information Retrieval.

[13]  Ming-Wei Chang,et al.  Question Answering Using Enhanced Lexical Semantic Models , 2013, ACL.

[14]  Jimmy Xiangji Huang,et al.  WSL-DS: Weakly Supervised Learning with Distant Supervision for Query Focused Multi-Document Abstractive Summarization , 2020, COLING.

[15]  Yue Ma,et al.  Predicting and Integrating Expected Answer Types into a Simple Recurrent Neural Network Model for Answer Sentence Selection , 2019, Computación y Sistemas.

[16]  Noah A. Smith,et al.  What is the Jeopardy Model? A Quasi-Synchronous Grammar for QA , 2007, EMNLP.

[17]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[18]  Jimmy J. Lin,et al.  Noise-Contrastive Estimation for Answer Selection with Deep Neural Networks , 2016, CIKM.

[19]  Xiangji Huang,et al.  Boosting Prediction Accuracy on Imbalanced Datasets with SVM Ensembles , 2006, PAKDD.

[20]  Qinmin Hu,et al.  CAN: Enhancing Sentence Similarity Modeling with Collaborative and Adversarial Network , 2018, SIGIR.

[21]  Siu Cheung Hui,et al.  Hyperbolic Representation Learning for Fast and Efficient Neural Question Answering , 2017, WSDM.

[22]  Xiangji Huang,et al.  Contextualized Embeddings based Transformer Encoder for Sentence Similarity Modeling in Answer Selection Task , 2020, LREC.

[23]  Zhoujun Li,et al.  A Survival Modeling Approach to Biomedical Search Result Diversification Using Wikipedia , 2013, IEEE Trans. Knowl. Data Eng..

[24]  Alessandro Moschitti,et al.  Automatic Feature Engineering for Answer Selection and Extraction , 2013, EMNLP.

[25]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[26]  Luo Si,et al.  York University at TREC 2007: Genomics Track , 2005, TREC.

[27]  Xiangji Huang,et al.  Mining Online Reviews for Predicting Sales Performance: A Case Study in the Movie Domain , 2012, IEEE Transactions on Knowledge and Data Engineering.

[28]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[29]  Xiaohui Yu,et al.  ARSA: a sentiment-aware model for predicting sales performance using blogs , 2007, SIGIR.

[30]  Zhifang Sui,et al.  A Multi-View Fusion Neural Network for Answer Selection , 2018, AAAI.

[31]  Yi Yang,et al.  WikiQA: A Challenge Dataset for Open-Domain Question Answering , 2015, EMNLP.

[32]  Thomas Wolf,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[33]  Jimmy J. Lin,et al.  Bridging the Gap between Relevance Matching and Semantic Matching for Short Text Similarity Modeling , 2019, EMNLP.

[34]  Qinmin Hu,et al.  A bayesian learning approach to promoting diversity in ranking for biomedical information retrieval , 2009, SIGIR.

[35]  Siu Cheung Hui,et al.  Learning to Rank Question Answer Pairs with Holographic Dual LSTM Architecture , 2017, SIGIR.

[36]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[37]  Xiangji Huang,et al.  Query Focused Abstractive Summarization via Incorporating Query Relevance and Transfer Learning with Transformer Models , 2020, Canadian Conference on AI.

[38]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.