An Entity-Focused Approach to Generating Company Descriptions

Finding quality descriptions on the web, such as those found in Wikipedia articles, of newer companies can be difficult: search engines show many pages with varying relevance, while multi-document summarization algorithms find it difficult to distinguish between core facts and other information such as news stories. In this paper, we propose an entity-focused, hybrid generation approach to automatically produce descriptions of previously unseen companies, and show that it outperforms a strong summarization baseline.

[1]  Dimitra Gkatzia,et al.  Finding middle ground? Multi-objective Natural Language Generation from time-series data , 2014, EACL.

[2]  Luis Gravano,et al.  Snowball: extracting relations from large plain-text collections , 2000, DL '00.

[3]  Christian M. I. M. Matthiessen,et al.  Text Generation and Systemic-Functional Linguistics: Experiences from English and Japanese , 1992 .

[4]  Michael Elhadad,et al.  FUF: the Universal Unifier User Manual Version 2.0 , 1989 .

[5]  Robert Dale,et al.  Building applied natural language generation systems , 1997, Natural Language Engineering.

[6]  Kathleen R. McKeown,et al.  Towards generating patient specific summaries of medical articles , 2001 .

[7]  Michael Elhadad,et al.  FUF: the Universal Unifier User Manual Version 5.2 , 1991 .

[8]  Inderjeet Mani,et al.  Producing Biographical Summaries: Combining Linguistic Knowledge with Corpus Statistics , 2001, ACL.

[9]  Dan Klein,et al.  A Simple Domain-Independent Probabilistic Approach to Generation , 2010, EMNLP.

[10]  Leo Wanner,et al.  Content selection from an ontology-based knowledge base for the generation of football summaries , 2011, ENLG.

[11]  William Yang Wang,et al.  Identifying Event Descriptions using Co-training with Online News Summaries , 2011, IJCNLP.

[12]  Elena Filatova,et al.  Tell Me What You Do and I'll Tell You What You Are: Learning Occupation-Related Activities for Biographies , 2005, HLT/EMNLP.

[13]  Jinxi Xu,et al.  A Hybrid Approach to Answering Biographical Questions , 2004, New Directions in Question Answering.

[14]  Michael White Towards Surface Realization with CCGs Induced from Dependencies , 2014, INLG.

[15]  Dragomir R. Radev,et al.  Using Random Walks for Question-focused Sentence Retrieval , 2005, HLT.

[16]  Regina Barzilay,et al.  Automatically Generating Wikipedia Articles: A Structure-Aware Approach , 2009, ACL.

[17]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[18]  Dianne P. O'Leary,et al.  Topic-Focused Multi-Document Summarization Using an Approximate Oracle Score , 2006, ACL.

[19]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[20]  Daniel Marcu,et al.  Bayesian Multi-Document Summarization at MSE , 2005 .

[21]  Kathleen McKeown,et al.  Discourse Planning with an N-gram Model of Relations , 2015, EMNLP.

[22]  Blake Howald,et al.  A Statistical NLG Framework for Aggregated Planning and Realization , 2013, ACL.

[23]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .