Cross-language Learning with Adversarial Neural Networks

We address the problem of cross-language adaptation for question-question similarity reranking in community question answering, with the objective to port a system trained on one input language to another input language given labeled training data for the first language and only unlabeled data for the second language. In particular, we propose to use adversarial training of neural networks to learn high-level features that are discriminative for the main learning task, and at the same time are invariant across the input languages. The evaluation results show sizable improvements for our cross-language adversarial neural network (CLANN) model over a strong non-adversarial system.

[1]  Preslav Nakov,et al.  It Takes Three to Tango: Triangulation Approach to Answer Ranking in Community Question Answering , 2016, EMNLP.

[2]  Yong Yu,et al.  Recommending questions using the mdl-based tree cut model , 2008, WWW.

[3]  W. Bruce Croft,et al.  Finding similar questions in large question and answer archives , 2005, CIKM '05.

[4]  Preslav Nakov,et al.  Global Thread-level Inference for Comment Classification in Community Question Answering , 2015, EMNLP.

[5]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[6]  Christopher D. Manning,et al.  Bilingual Word Representations with Monolingual Quality in Mind , 2015, VS@HLT-NAACL.

[7]  Noriko Kando,et al.  Overview of the NTCIR-7 ACLIA Tasks: Advanced Cross-Lingual Information Access , 2008, NTCIR.

[8]  Li Cai,et al.  Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives , 2011, ACL.

[9]  Jörg Tiedemann,et al.  Parallel Data, Tools and Interfaces in OPUS , 2012, LREC.

[10]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[11]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[12]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13]  David Pfau,et al.  Unrolled Generative Adversarial Networks , 2016, ICLR.

[14]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[15]  Amir Pouran Ben Veyseh Cross-Lingual Question Answering Using Common Semantic Space , 2016, TextGraphs@NAACL-HLT.

[16]  Alessandro Moschitti,et al.  Semi-supervised Question Retrieval with Gated Convolutions , 2015, NAACL.

[17]  Gosse Bouma,et al.  Question Answering with Joost at CLEF 2007 , 2007, CLEF.

[18]  Preslav Nakov,et al.  QCRI: Answer Selection for Community Question Answering - Experiments for Arabic and English , 2015, *SEMEVAL.

[19]  Cícero Nogueira dos Santos,et al.  Learning Hybrid Representations to Retrieve Semantically Equivalent Questions , 2015, ACL.

[20]  Preslav Nakov,et al.  Machine Translation Evaluation Meets Community Question Answering , 2016, ACL.

[21]  Yi Liu,et al.  Statistical Machine Translation for Query Expansion in Answer Retrieval , 2007, ACL.

[22]  Preslav Nakov,et al.  Pairwise Neural Machine Translation Evaluation , 2015, ACL.

[23]  Saif Mohammad,et al.  How Translation Alters Sentiment , 2016, J. Artif. Intell. Res..

[24]  Preslav Nakov,et al.  Joint Learning with Global Inference for Comment Classification in Community Question Answering , 2016, NAACL.

[25]  Alon Lavie,et al.  The Meteor metric for automatic evaluation of machine translation , 2009, Machine Translation.

[26]  Kai Wang,et al.  A syntactic tree matching approach to finding similar questions in community-based qa services , 2009, SIGIR.

[27]  Yonatan Belinkov,et al.  Neural Attention for Learning to Rank Questions in Community Question Answering , 2016, COLING.

[28]  SurdeanuMihai,et al.  Learning to rank answers to non-factoid questions from web collections , 2011 .

[29]  Ferhan Türe,et al.  Learning to Translate for Multilingual Question Answering , 2016, EMNLP.

[30]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[31]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[32]  Vibhu O. Mittal,et al.  Bridging the lexical chasm: statistical approaches to answer-finding , 2000, SIGIR '00.

[33]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[34]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[35]  Eneko Agirre,et al.  SemEval-2016 Task 1: Semantic Textual Similarity, Monolingual and Cross-Lingual Evaluation , 2016, *SEMEVAL.

[36]  Preslav Nakov,et al.  SemanticZ at SemEval-2016 Task 3: Ranking Relevant Answers in Community Question Answering Using Semantic Similarity Based on Fine-tuned Word Embeddings , 2016, *SEMEVAL.

[37]  Sven Hartrumpf,et al.  University of Hagen at QA@CLEF 2008: Efficient Question Answering with Question Decomposition and Multiple Answer Streams , 2008, CLEF.

[38]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[39]  Ian Soboroff,et al.  The BOLT IR Test Collections of Multilingual Passage Retrieval from Discussion Forums , 2016, SIGIR.

[40]  Zhoujun Li,et al.  Question Retrieval with High Quality Answers in Community Question Answering , 2014, CIKM.

[41]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[42]  Bogdan Sacaleanu,et al.  Overview of the CLEF 2008 Multilingual Question Answering Track , 2008, CLEF.

[43]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[44]  Preslav Nakov,et al.  SemEval-2017 Task 3: Community Question Answering , 2017, *SEMEVAL.

[45]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[46]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[47]  Daniel Marcu,et al.  A Noisy-Channel Approach to Question Answering , 2003, ACL.

[48]  Preslav Nakov,et al.  Cross-Language Question Re-Ranking , 2017, SIGIR.

[49]  Manaal Faruqui,et al.  Cross-lingual Models of Word Embeddings: An Empirical Comparison , 2016, ACL.

[50]  Alberto Barrón-Cedeño,et al.  Learning to Re-Rank Questions in Community Question Answering Using Advanced Features , 2016, CIKM.

[51]  Navdeep Jaitly,et al.  Adversarial Autoencoders , 2015, ArXiv.

[52]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[53]  Ahmed Abdelali,et al.  The AMARA Corpus: Building Parallel Language Resources for the Educational Domain , 2014, LREC.

[54]  Eric Brill,et al.  Automatic question answering using the web: Beyond the Factoid , 2006, Information Retrieval.

[55]  Chuan-Jie Lin,et al.  Description of the NTOU Complex QA System , 2010, NTCIR.