Truth Discovery from Multi-Sourced Text Data Based on Ant Colony Optimization

In the era of information explosion, plenty of data has been generated through a variety of channels, such as social networks, crowdsourcing platforms and blogs. Conflicts and errors are constantly emerging. Truth discovery aims to find trustworthy information from conflicting data by considering source reliability. However, most traditional truth discovery approaches are designed only for structured data, and fail to meet the strong requirements to extract trustworthy information from unstructured raw text data. The major challenges of inferring reliable information on text data stem from the multifactorial property (i.e., an answer may contain multiple different key factors, which may be complex) and the diversity of word usages (i.e., different words may share similar semantic information, but the spelling of which are completely different). To solve these challenges, an ant colony optimization based text data truth discovery model is proposed. Firstly, keywords extracted from the whole answers of the specific question are grouped into a set. Then, we translate the truth discovery problem to a subset optimization problem, and the parallel ant colony optimization is utilized to find correct keywords for each question based on the hypothesis of truth discovery from the whole keywords. After that, the answers to each question can be ranked based on the similarities between keywords of user answers and identified correct keywords found by colony. The experiment results on real dataset show that even the semantic information of text data is complex, our proposed model can still find trustworthy information from complex answers compared with retrieval-based and state-of-the-art approaches.

[1]  Charu C. Aggarwal,et al.  Using humans as sensors: An estimation-theoretic perspective , 2014, IPSN-14 Proceedings of the 13th International Symposium on Information Processing in Sensor Networks.

[2]  Xiaoxin Yin,et al.  Semi-supervised truth discovery , 2011, WWW.

[3]  Meng Xiaofeng,et al.  MTruths:An Approach of Multiple Truths Finding from Web Information , 2016 .

[4]  S. Senju,et al.  An Approach to Linear Programming with 0--1 Variables , 1968 .

[5]  Gjergji Kasneci,et al.  Restricted Boltzmann Machines for Robust and Fast Latent Truth Discovery , 2018, ArXiv.

[6]  Clement T. Yu,et al.  Verification of Fact Statements with Multiple Truthful Alternatives , 2016, WEBIST.

[7]  Chris Callison-Burch,et al.  Answer Extraction as Sequence Tagging with Tree Edit Distance , 2013, NAACL.

[8]  Bo Zhao,et al.  Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation , 2014, SIGMOD Conference.

[9]  Brian D. Davison,et al.  A classification-based approach to question answering in discussion boards , 2009, SIGIR.

[10]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.

[11]  Jiawei Han,et al.  A Probabilistic Model for Estimating Real-valued Truth from Conflicting Sources , 2012 .

[12]  Z. Michalewicz,et al.  A new version of ant system for subset problems , 1999, Proceedings of the 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406).

[13]  Bowen Zhou,et al.  Applying deep learning to answer selection: A study and an open task , 2015, 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU).

[14]  Gerhard Weikum,et al.  Credibility Assessment of Textual Claims on the Web , 2016, CIKM.

[15]  Gjergji Kasneci,et al.  LTD-RBM: Robust and Fast Latent Truth Discovery Using Restricted Boltzmann Machines , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).

[16]  Dong Wang,et al.  A Neural Network Approach for Truth Discovery in Social Sensing , 2017, 2017 IEEE 14th International Conference on Mobile Ad Hoc and Sensor Systems (MASS).

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  Luyang Li,et al.  Truth Discovery with Memory Network , 2016, ArXiv.

[19]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[20]  Dan Roth,et al.  Latent credibility analysis , 2013, WWW.

[21]  Shengrui Wang,et al.  Identifying authoritative actors in question-answering forums: the case of Yahoo! answers , 2008, KDD.

[22]  Wei Zhang,et al.  Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources , 2015, Proc. VLDB Endow..

[23]  Bo Zhao,et al.  A Confidence-Aware Approach for Truth Discovery on Long-Tail Data , 2014, Proc. VLDB Endow..

[24]  Bo Zhao,et al.  A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration , 2012, Proc. VLDB Endow..

[25]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[26]  Manish Shrivastava,et al.  Neural Network Architecture for Credibility Assessment of Textual Claims , 2018, CICLing.

[27]  Wei Fan,et al.  Reliable Medical Diagnosis from Crowdsourcing: Discover Trustworthy Answers from Non-Experts , 2017, WSDM.

[28]  Bo Zhao,et al.  A Survey on Truth Discovery , 2015, SKDD.

[29]  Rada Mihalcea,et al.  Text-to-Text Semantic Similarity for Automatic Short Answer Grading , 2009, EACL.

[30]  Clement T. Yu,et al.  T-verifier: Verifying truthfulness of fact statements , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[31]  Di Wang,et al.  A Long Short-Term Memory Model for Answer Sentence Selection in Question Answering , 2015, ACL.

[32]  Fenglong Ma,et al.  TextTruth: An Unsupervised Approach to Discover Trustworthy Information from Multi-Sourced Text Data , 2018, KDD.

[33]  Zhiguo Wang,et al.  Sentence Similarity Learning by Lexical Decomposition and Composition , 2016, COLING.

[34]  Bo Zhao,et al.  On the Discovery of Evolving Truth , 2015, KDD.

[35]  Jun Zhao,et al.  Topic-sensitive probabilistic model for expert finding in question answer communities , 2012, CIKM.

[36]  Huiping Sun,et al.  CQArank: jointly model topics and expertise in community question answering , 2013, CIKM.

[37]  Murat Demirbas,et al.  Crowdsourcing for Multiple-Choice Question Answering , 2014, AAAI.

[38]  Divesh Srivastava,et al.  Integrating Conflicting Data: The Role of Source Dependence , 2009, Proc. VLDB Endow..