A novel community answer matching approach based on phrase fusion heterogeneous information network

Abstract Community Question Answering (CQA) allows users to ask or answer questions in a social way, so it is becoming the primary means for people acquiring knowledge. However, the asker must wait until a satisfactory answer appears, which reduces user activity. In this paper, we propose an innovative answering method that matches the most relevant answers for the new issue automatically. Firstly, we utilize phrases to represent the semantic of the posts (answers/questions) and construct a Phrase Fusion Heterogeneous Information Network, called PFHIN, to represent complex entity relationships in CQA. So, the answer selection is regarded as the related entity retrieval task. Then, we define the distance between entities in PFHIN, which is independent of the meta path. Finally, the Type-constrained Top-k Similarity Entity Finding Algorithm (TTSEF) is proposed for finding the nearest entities according to the known start entity and end-entity type, which can match the most relevant answers automatically.To the best of our knowledge, it is the first work to define the phrase information network for answer selection and provide a novel idea for the heterogeneous information network fusion. Experimental results on three large-scale datasets (Stack Overflow, Super User, and Mathematics) from Stack Exchange demonstrate that our proposed approaches significantly outperform the state-of-the-art answer retrieval methods. Moreover, we conduct an in-depth analysis of the meta path to the optimal answer and reveal the critical role of phrases in community answer matching.

[1]  Lior Rokach,et al.  Implicit Dimension Identification in User-Generated Text with LSTM Networks , 2019, Inf. Process. Manag..

[2]  Alaa Mohasseb,et al.  Question categorization and classification using grammar based approach , 2018, Inf. Process. Manag..

[3]  Hao Wu,et al.  Extracting Medical Knowledge from Crowdsourced Question Answering Website , 2020, IEEE Transactions on Big Data.

[4]  Tat-Seng Chua,et al.  Quality Matters: Assessing cQA Pair Quality via Transductive Multi-View Learning , 2018, IJCAI.

[5]  Jun Chen,et al.  Meta-Circuit machine: Inferencing human collaborative relationships in heterogeneous information networks , 2019, Inf. Process. Manag..

[6]  Alejandro Figueroa,et al.  Leveraging linguistic traits and semi-supervised learning to single out informational content across how-to community question-answering archives , 2017, Inf. Sci..

[7]  Chaogang Fu,et al.  Tracking user-role evolution via topic modeling in community question answering , 2019, Inf. Process. Manag..

[8]  Yi-Liang Zhao,et al.  Bridging the Vocabulary Gap between Health Seekers and Healthcare Knowledge , 2015, IEEE Transactions on Knowledge and Data Engineering.

[9]  Yongliang Wu,et al.  Community answer generation based on knowledge graph , 2021, Inf. Sci..

[10]  Xiang Cheng,et al.  A Multi-Objective Optimization Approach for Question Routing in Community Question Answering Services , 2017, IEEE Transactions on Knowledge and Data Engineering.

[11]  Wei Wu,et al.  Question Condensing Networks for Answer Selection in Community Question Answering , 2018, ACL.

[12]  Xiang Li,et al.  Meta Structure: Computing Relevance in Large Heterogeneous Information Networks , 2016, KDD.

[13]  Hamidah Ibrahim,et al.  Review on the advancements of disambiguation in semantic question answering system , 2017, Inf. Process. Manag..

[14]  Philip S. Yu,et al.  Heterogeneous Information Network Embedding for Recommendation , 2017, IEEE Transactions on Knowledge and Data Engineering.

[15]  Hanqing Lu,et al.  Community-Based Question Answering via Contextual Ranking Metric Network Learning , 2017, AAAI.

[16]  Preslav Nakov,et al.  Joint Multitask Learning for Community Question Answering Using Task-Specific Embeddings , 2018, EMNLP.

[17]  Aun Irtaza,et al.  Fuzzy topic modeling approach for text mining over short text , 2019, Inf. Process. Manag..

[18]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[19]  Bernard J. Jansen,et al.  Identifying and predicting the desire to help in social question and answering , 2017, Inf. Process. Manag..

[20]  Hamid Beigy,et al.  On dynamicity of expert finding in community question answering , 2017, Inf. Process. Manag..

[21]  Philip S. Yu,et al.  PathSim , 2011 .

[22]  Nan Jiang,et al.  Word Embedding Based Correlation Model for Question/Answer Matching , 2015, AAAI.

[23]  Ying Shen,et al.  Attentive User-Engaged Adversarial Neural Network for Community Question Answering , 2020, AAAI.

[24]  Grigorios Tsoumakas,et al.  Local word vectors guiding keyphrase extraction , 2018, Inf. Process. Manag..

[25]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[26]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[27]  Yongliang Wu,et al.  Phrase2Vec: Phrase embedding based on parsing , 2020, Inf. Sci..

[28]  Patrícia Augustin Jaques,et al.  An Analysis of Hierarchical Text Classification Using Word Embeddings , 2018, Inf. Sci..

[29]  Yunfei Long,et al.  Phrase embedding learning based on external and internal context with compositionality constraint , 2018, Knowl. Based Syst..

[30]  Gao Cong,et al.  A General Recommendation Model for Heterogeneous Networks , 2016, IEEE Transactions on Knowledge and Data Engineering.

[31]  Chirag Shah,et al.  Retrieving people: Identifying potential answerers in Community Question‐Answering , 2018, J. Assoc. Inf. Sci. Technol..

[32]  Yueting Zhuang,et al.  Temporal Interaction and Causal Influence in Community-Based Question Answering , 2017, IEEE Transactions on Knowledge and Data Engineering.

[33]  Li Deng,et al.  Question-Answering with Grammatically-Interpretable Representations , 2017, AAAI.

[34]  Philip S. Yu,et al.  HeteSim: A General Framework for Relevance Measure in Heterogeneous Networks , 2013, IEEE Transactions on Knowledge and Data Engineering.

[35]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[36]  Xuanjing Huang,et al.  Recurrent Memory Reasoning Network for Expert Finding in Community Question Answering , 2020, WSDM.

[37]  Xiangyu Wang,et al.  Learning to Recommend Descriptive Tags for Questions in Social Forums , 2014, TOIS.

[38]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[39]  Srinivasan Parthasarathy,et al.  An End-to-End Framework for Cold Question Routing in Community Question Answering Services , 2019, International Joint Conference on Artificial Intelligence.

[40]  David Konopnicki,et al.  Unsupervised FAQ Retrieval with Question Generation and BERT , 2020, ACL.

[41]  Jiawei Han,et al.  Automated Phrase Mining from Massive Text Corpora , 2017, IEEE Transactions on Knowledge and Data Engineering.

[42]  Dong-Hee Shin,et al.  The effects of security and traceability of blockchain on digital affordance , 2020, Online Inf. Rev..

[43]  Sadao Kurohashi,et al.  FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance , 2019, SIGIR.

[44]  Yanchun Zhang,et al.  An Efficient Method for High Quality and Cohesive Topical Phrase Mining , 2019, IEEE Transactions on Knowledge and Data Engineering.

[45]  Philip S. Yu,et al.  Top-k Similarity Join in Heterogeneous Information Networks , 2015, IEEE Transactions on Knowledge and Data Engineering.

[46]  Donghee Shin,et al.  How do users interact with algorithm recommender systems? The interaction of users, algorithms, and performance , 2020, Comput. Hum. Behav..

[47]  Dik Lun Lee,et al.  Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks , 2017, KDD.

[48]  Frank Biocca,et al.  Beyond user experience: What constitutes algorithmic experiences? , 2020, Int. J. Inf. Manag..

[49]  Murat Can Ganiz,et al.  Semantic text classification: A survey of past and recent advances , 2018, Inf. Process. Manag..

[50]  Luciano da Fontoura Costa,et al.  Paragraph-based representation of texts: A complex networks approach , 2019, Inf. Process. Manag..

[51]  Houfeng Wang,et al.  Attentive Interactive Neural Networks for Answer Selection in Community Question Answering , 2017, AAAI.

[52]  Jiawei Han,et al.  Embedding Learning with Events in Heterogeneous Information Networks , 2017, IEEE Transactions on Knowledge and Data Engineering.

[53]  Jimmy Xiangji Huang,et al.  Modeling and Learning Distributed Word Representation with Metadata for Question Retrieval , 2017, IEEE Transactions on Knowledge and Data Engineering.

[54]  Xiaojie Yuan,et al.  SHINE+: A General Framework for Domain-Specific Entity Linking with Heterogeneous Information Networks , 2018, IEEE Transactions on Knowledge and Data Engineering.