UH-PRHLT at SemEval-2016 Task 3: Combining Lexical and Semantic-based Features for Community Question Answering

In this work we describe the system built for the three English subtasks of the SemEval 2016 Task 3 by the Department of Computer Science of the University of Houston (UH) and the Pattern Recognition and Human Language Technology (PRHLT) research center - Universitat Polit`ecnica de Val`encia: UH-PRHLT. Our system represents instances by using both lexical and semantic-based similarity measures between text pairs. Our semantic features include the use of distributed representations of words, knowledge graphs generated with the BabelNet multilingual semantic network, and the FrameNet lexical database. Experimental results outperform the random and Google search engine baselines in the three English subtasks. Our approach obtained the highest results of subtask B compared to the other task participants.

[1]  Rada Mihalcea,et al.  Semantic Relatedness Using Salient Semantic Analysis , 2011, AAAI.

[2]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[3]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[4]  Rafael E. Banchs,et al.  Cross-language plagiarism detection over continuous-space representations of language , 2017 .

[5]  Paolo Rosso,et al.  Language Variety Identification Using Distributed Representations of Words and Documents , 2015, CLEF.

[6]  Xiaolong Wang,et al.  HITSZ-ICRC: Exploiting Classification Approach for Answer Selection in Community Question Answering , 2015, *SEMEVAL.

[7]  Patrick Pantel,et al.  Discovery of inference rules for question-answering , 2001, Natural Language Engineering.

[8]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[9]  Paolo Rosso,et al.  Distributed Representations of Words and Documents for Discriminating Similar Languages , 2015 .

[10]  Preslav Nakov,et al.  SemEval-2015 Task 3: Answer Selection in Community Question Answering , 2015, *SEMEVAL.

[11]  Gokhan Tur,et al.  LDA Based Similarity Modeling for Question Answering , 2010, HLT-NAACL 2010.

[12]  Preslav Nakov,et al.  SemEval-2016 Task 3: Community Question Answering , 2019, *SEMEVAL.

[13]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[14]  Paolo Rosso,et al.  A systematic study of knowledge graph analysis for cross-language plagiarism detection , 2016, Inf. Process. Manag..

[15]  Jasper Snoek,et al.  Bayesian Optimization and Semiparametric Models with Applications to Assistive Technology , 2014 .

[16]  Daniel Jurafsky,et al.  Obituary: Charles J. Fillmore , 2014, CL.

[17]  Paolo Rosso,et al.  Answering questions with an n-gram based passage retrieval engine , 2009, Journal of Intelligent Information Systems.

[18]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[19]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  Paolo Rosso,et al.  A Knowledge-based Representation for Cross-Language Document Retrieval and Categorization , 2014, EACL.

[22]  Paolo Rosso,et al.  On the Voice-Activated Question Answering , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[23]  Simone Paolo Ponzetto,et al.  BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network , 2012, Artif. Intell..

[24]  Quan Hung Tran,et al.  JAIST: Combining multiple features for Answer Selection in Community Question Answering , 2015, *SEMEVAL.

[25]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[26]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[27]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.