Finding Similar Medical Questions from Question Answering Websites

The past few years have witnessed the flourishing of crowdsourced medical question answering (Q&A) websites. Patients who have medical information demands tend to post questions about their health conditions on these crowdsourced Q&A websites and get answers from other users. However, we observe that a large portion of new medical questions cannot be answered in time or receive only few answers from these websites. On the other hand, we notice that solved questions have great potential to solve this challenge. Motivated by these, we propose an end-to-end system that can automatically find similar questions for unsolved medical questions. By learning the vector presentation of unsolved questions and their candidate similar questions, the proposed system outputs similar questions according to the similarity between vector representations. Through the vector representation, the similar questions are found at the question level, and the diversity of medical questions expression issue can be addressed. Further, we handle two more important issues, i.e., training data generation issue and efficiency issue, associated with the LSTM training procedure and the retrieval of candidate similar questions. The effectiveness of the proposed system is validated on a large-scale real-world dataset collected from a crowdsourced maternal-infant Q&A website.

[1]  Ben He,et al.  Question-answer topic model for question retrieval in community question answering , 2012, CIKM.

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Xuanjing Huang,et al.  Convolutional Neural Tensor Network Architecture for Community-Based Question Answering , 2015, IJCAI.

[4]  Rabab Kreidieh Ward,et al.  Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[7]  Zhoujun Li,et al.  Question Retrieval with High Quality Answers in Community Question Answering , 2014, CIKM.

[8]  Christian S. Jensen,et al.  The use of categorization information in language models for question retrieval , 2009, CIKM.

[9]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[10]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[11]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[12]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[13]  Arpita Das,et al.  Together we stand: Siamese Networks for Similar Question Retrieval , 2016, ACL.

[14]  Philip S. Yu,et al.  Mining User Intentions from Medical Queries: A Neural Network Based Heterogeneous Jointly Modeling Approach , 2016, WWW.

[15]  Christian S. Jensen,et al.  A generalized framework of exploring category information for question retrieval in community question answer archives , 2010, WWW '10.

[16]  Vibhu O. Mittal,et al.  Bridging the lexical chasm: statistical approaches to answer-finding , 2000, SIGIR '00.

[17]  Li Cai,et al.  Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives , 2011, ACL.

[18]  Zhoujun Li,et al.  Learning Distributed Representations of Data in Community Question Answering for Question Retrieval , 2016, WSDM.

[19]  Tingting He,et al.  Learning semantic representation with neural networks for community question answering retrieval , 2016, Knowl. Based Syst..

[20]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[21]  Irwin King,et al.  Routing questions to appropriate answerers in community question answering services , 2010, CIKM.

[22]  Arpita Das,et al.  Mirror on the Wall: Finding Similar Questions with Deep Structured Topic Modeling , 2016, PAKDD.

[23]  Xueqi Cheng,et al.  A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations , 2015, AAAI.

[24]  Li Cai,et al.  Learning the Latent Topics for Question Retrieval in Community QA , 2011, IJCNLP.

[25]  W. Bruce Croft,et al.  Retrieval models for question and answer archives , 2008, SIGIR '08.

[26]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[27]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[28]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.