Generating titles for millions of browse pages on an e-Commerce site

We present two approaches to generate titles for browse pages in five different languages, namely English, German, French, Italian and Spanish. These browse pages are structured search pages in an e-commerce domain. We first present a rule-based approach to generate these browse page titles. In addition, we also present a hybrid approach which uses a phrase-based statistical machine translation engine on top of the rule-based system to assemble the best title. For the two languages English and German we have access to a large amount of already available rule-based generated and curated titles. For these languages we present an automatic post-editing approach which learns how to post-edit the rule-based titles into curated titles.

[1]  James Shaw,et al.  Practical Issues in Automatic Documentation Generation , 1994, ANLP.

[2]  Richard I. Kittredge,et al.  Using natural-language processing to produce weather forecasts , 1994, IEEE Expert.

[3]  R. Schwartz,et al.  Automatic Headline Generation for Newspaper Stories , 2002 .

[4]  Jim Hunter,et al.  Choosing words in computer-generated weather forecasts , 2005, Artif. Intell..

[5]  Maja Popovic,et al.  chrF deconstructed: beta parameters and n-gram weights , 2016, WMT.

[6]  Rico Sennrich,et al.  The AMU-UEDIN Submission to the WMT16 News Translation Task: Attention-based NMT Models as Feature Functions in Phrase-based SMT , 2016, WMT.

[7]  Hermann Ney,et al.  An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research , 2000, LREC.

[8]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[9]  Matthew R. Walter,et al.  What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment , 2015, NAACL.

[10]  Daniel Duma,et al.  Generating Natural Language from Linked Data: Unsupervised template extraction , 2013, IWCS.

[11]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[12]  Nadir Durrani,et al.  A Joint Sequence Translation Model with Integrated Reordering , 2011, ACL.

[13]  Mirella Lapata,et al.  Concept-to-text Generation via Discriminative Reranking , 2012, ACL.

[14]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[15]  Mauro Cettolo,et al.  Online adaptation to post-edits for phrase-based statistical machine translation , 2014, Machine Translation.

[16]  George F. Foster,et al.  Batch Tuning Strategies for Statistical Machine Translation , 2012, NAACL.

[17]  David Grangier,et al.  Generating Text from Structured Data with Application to the Biography Domain , 2016, ArXiv.

[18]  Michel Simard,et al.  Statistical Phrase-Based Post-Editing , 2007, NAACL.

[19]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[20]  Samuel Webb Williams,et al.  The Realities of Generating Natural Language from Databases , 1998 .

[21]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[22]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[23]  Will Radford,et al.  Learning to generate one-sentence biographies from Wikidata , 2017, EACL.