Facet Annotation by Extending CNN with a Matching Strategy

Most community question answering (CQA) websites manage plenty of question-answer pairs (QAPs) through topic-based organizations, which may not satisfy users' fine-grained search demands. Facets of topics serve as a powerful tool to navigate, refine, and group the QAPs. In this work, we propose FACM, a model to annotate QAPs with facets by extending convolution neural networks (CNNs) with a matching strategy. First, phrase information is incorporated into text representation by CNNs with different kernel sizes. Then, through a matching strategy among QAPs and facet label texts (FaLTs) acquired from Wikipedia, we generate similarity matrices to deal with the facet heterogeneity. Finally, a three-channel CNN is trained for facet label assignment of QAPs. Experiments on three real-world data sets show that FACM outperforms the state-of-the-art methods.

[1]  Zhoujun Li,et al.  Mining Query Subtopics from Questions in Community Question Answering , 2015, AAAI.

[2]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[3]  Lynette Hirschman,et al.  Natural language question answering: the view from here , 2001, Natural Language Engineering.

[4]  Hong Cheng,et al.  Diversifying Search Results through Pattern-Based Subtopic Modeling , 2012, Int. J. Semantic Web Inf. Syst..

[5]  Hiroya Takamura,et al.  Sentiment Classification Using Word Sub-sequences and Dependency Sub-trees , 2005, PAKDD.

[6]  Jun Chen,et al.  Semi-supervised learning for question classification in CQA , 2016, Natural Computing.

[7]  Hang Li,et al.  A Deep Architecture for Matching Short Texts , 2013, NIPS.

[8]  Gang Wang,et al.  Wisdom in the social crowd: an analysis of quora , 2013, WWW.

[9]  James Allan,et al.  Extracting query facets from search results , 2013, SIGIR.

[10]  Yang Liu,et al.  Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention , 2016, ArXiv.

[11]  W. Bruce Croft,et al.  Inferring query aspects from reformulations using clustering , 2011, CIKM '11.

[12]  Diego Mollá Aliod,et al.  Answerfinder: Question Answering by Combining Lexical, Syntactic and Semantic Information , 2004, ALTA.

[13]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[14]  Fan Zhang,et al.  Mining subtopics from text fragments for a web query , 2013, Information Retrieval.

[15]  Christopher D. Manning,et al.  A Phrase-Based Alignment Model for Natural Language Inference , 2008, EMNLP.

[16]  Xuanjing Huang,et al.  Deep Fusion LSTMs for Text Semantic Matching , 2016, ACL.

[17]  Baogang Wei,et al.  Query Subtopic Mining via Subtractive Initialization of Non-negative Sparse Latent Semantic Analysis , 2016, J. Inf. Sci. Eng..

[18]  Xiaodong Liu,et al.  Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval , 2015, NAACL.

[19]  Masaki Aono,et al.  Query Subtopic Mining Exploiting Word Embedding for Search Result Diversification , 2016, AIRS.

[20]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[21]  Shuohang Wang,et al.  Learning Natural Language Inference with LSTM , 2015, NAACL.

[22]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[23]  Lawrence Carin,et al.  Deconvolutional Latent-Variable Model for Text Sequence Matching , 2017, AAAI.

[24]  Rui Yan,et al.  Natural Language Inference by Tree-Based Convolution and Heuristic Matching , 2015, ACL.

[25]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[26]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[27]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[28]  Hakan Ferhatosmanoglu,et al.  Short text classification in twitter to improve information filtering , 2010, SIGIR.

[29]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[30]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[31]  Hang Li,et al.  Convolutional Neural Network Architectures for Matching Natural Language Sentences , 2014, NIPS.

[32]  N. Omar,et al.  A rule-based approach in Bloom's Taxonomy question classification through natural language processing , 2012, 2012 7th International Conference on Computing and Convergence Technology (ICCCT).

[33]  Jing Zhang,et al.  Research on attention memory networks as a model for learning natural language inference , 2016, SPNLP@EMNLP.

[34]  Dan Klein,et al.  Learning to Compose Neural Networks for Question Answering , 2016, NAACL.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[37]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[38]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[39]  Keishi Tajima,et al.  Subtopic Ranking based on Hierarchical Headings , 2016, WEBIST.

[40]  Dragomir R. Radev,et al.  The Use of Predictive Annotation for Question Answering in TREC8 , 1999, TREC.

[41]  Fuji Ren,et al.  Subtopic Mining via Modifier Graph Clustering , 2014, PAKDD.

[42]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[43]  Mengen Chen,et al.  Short Text Classification Improved by Learning Multi-Granularity Topics , 2011, IJCAI.

[44]  David Carmel,et al.  eResponder: Electronic Question Responder , 2000, CoopIS.

[45]  Jason Weston,et al.  #TagSpace: Semantic Embeddings from Hashtags , 2014, EMNLP.

[46]  Doug Beeferman,et al.  Agglomerative clustering of a search engine query log , 2000, KDD '00.

[47]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[48]  Zhiyuan Liu,et al.  A C-LSTM Neural Network for Text Classification , 2015, ArXiv.

[49]  Zhiwei Sun,et al.  Question/Answer Matching for CQA System via Combining Lexical and Sequential Information , 2015, AAAI.

[50]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[51]  David A. Hull Xerox TREC-8 Question Answering Track Report , 1999, TREC.

[52]  Qinghua Zheng,et al.  Mining query subtopics from search log data , 2012, SIGIR '12.

[53]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[54]  James Allan,et al.  Using part-of-speech patterns to reduce query ambiguity , 2002, SIGIR '02.