Unsupervised Content Discovery from Concise Summaries

Domain adaptation is a time consuming and costly procedure calling for the development of algorithms and tools to facilitate its automation. This paper presents an unsupervised algorithm able to learn the main concepts in event summaries. The method takes as input a set of domain summaries annotated with shallow linguistic information and produces a domain template. We demonstrate the viability of the method by applying it to three different domains and two languages. We have evaluated the generated templates against human templates obtaining encouraging results.

[1]  Doug Downey,et al.  Learning text patterns for web information extraction and assessment , 2004, AAAI 2004.

[2]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[3]  Steffen Staab,et al.  Ontology Learning from Text , 2000, NLDB.

[4]  Razvan C. Bunescu,et al.  Learning to Extract Relations from the Web using Minimal Supervision , 2007, ACL.

[5]  C. Paice,et al.  Term extraction for automatic abstracting , 1998 .

[6]  Christian Biemann,et al.  Ontology Learning from Text: A Survey of Methods , 2005, LDV Forum.

[7]  Horacio Saggion,et al.  Learning Predicate Insertion Rules for Document Abstracting , 2011, CICLing.

[8]  Yinglin Wang,et al.  Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining , 2010, ACL.

[9]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[10]  Gerald DeJong,et al.  An Overview of the FRUMP System Introduction , 2014 .

[11]  Oren Etzioni,et al.  Open Information Extraction: The Second Generation , 2011, IJCAI.

[12]  Steffen Staab,et al.  Ontology Learning from Text , 2000, International Conference on Applications of Natural Language to Data Bases.

[13]  Horacio Saggion,et al.  The CONCISUS Corpus of Event Summaries , 2012, LREC.

[14]  Kalina Bontcheva,et al.  Architectural elements of language engineering robustness , 2002, Natural Language Engineering.

[15]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[16]  Oren Etzioni,et al.  Open Information Extraction from the Web , 2007, CACM.

[17]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Schemas and their Participants , 2009, ACL.

[18]  Kalina Bontcheva,et al.  SVM Based Learning System for Information Extraction , 2004, Deterministic and Statistical Methods in Machine Learning.

[19]  Paul Buitelaar,et al.  Ontology Learning from Text: An Overview , 2005 .

[20]  Nathanael Chambers,et al.  Template-Based Information Extraction without the Templates , 2011, ACL.

[21]  Eduard H. Hovy,et al.  Learning surface text patterns for a Question Answering System , 2002, ACL.

[22]  Yorick Wilks,et al.  Designing Adaptive Information Extraction for the Semantic Web in Amilcare , 2003 .

[23]  Horacio Saggion,et al.  Multi-domain Cross-lingual Information Extraction from Clean and Noisy Texts , 2011, STIL.

[24]  Horacio Saggion,et al.  Generating Indicative-Informative Summaries with SumUM , 2002, Computational Linguistics.