Unsupervised Learning Summarization Templates from Concise Summaries

We here present and compare two unsupervised approaches for inducing the main conceptual information in rather stereotypical summaries in two different languages. We evaluate the two approaches in two different information extraction settings: monolingual and cross-lingual information extraction. The extraction systems are trained on auto-annotated summaries (containing the induced concepts) and evaluated on humanannotated documents. Extraction results are promising, being close in performance to those achieved when the system is trained on human-annotated summaries.

[1]  Ellen Riloff,et al.  Automatically Constructing a Dictionary for Information Extraction Tasks , 1993, AAAI.

[2]  Thierry Poibeau,et al.  Automatic Text Summarization: Past, Present and Future , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[3]  Kalina Bontcheva,et al.  Architectural elements of language engineering robustness , 2002, Natural Language Engineering.

[4]  Cane Wing-ki Leung,et al.  Unsupervised Information Extraction with Distributional Prior Knowledge , 2011, EMNLP.

[5]  Doug Downey,et al.  Methods for Domain-Independent Information Extraction from the Web: An Experimental Comparison , 2004, AAAI.

[6]  Eric Brill,et al.  Some Advances in Transformation-Based Part of Speech Tagging , 1994, AAAI.

[7]  Regina Barzilay,et al.  Learning to Paraphrase: An Unsupervised Approach Using Multiple-Sequence Alignment , 2003, NAACL.

[8]  Pedro M. Domingos,et al.  Unsupervised Ontology Induction from Text , 2010, ACL.

[9]  Horacio Saggion,et al.  Multi-domain Cross-lingual Information Extraction from Clean and Noisy Texts , 2011, STIL.

[10]  Mirella Lapata,et al.  Concept-to-text Generation via Discriminative Reranking , 2012, ACL.

[11]  Jakub Piskorski,et al.  Information Extraction: Past, Present and Future , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[12]  Horacio Saggion,et al.  Generating Indicative-Informative Summaries with SumUM , 2002, Computational Linguistics.

[13]  Roman Yangarber,et al.  Counter-Training in Discovery of Semantic Patterns , 2003, ACL.

[14]  Yinglin Wang,et al.  Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining , 2010, ACL.

[15]  Ellen Riloff,et al.  Automatically Generating Extraction Patterns from Untagged Text , 1996, AAAI/IAAI, Vol. 2.

[16]  Oren Etzioni,et al.  The Tradeoffs Between Open and Traditional Relation Extraction , 2008, ACL.

[17]  Adwait Ratnaparkhi,et al.  Trainable Methods for Surface Natural Language Generation , 2000, ANLP.

[18]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[19]  Christian Biemann,et al.  Ontology Learning from Text: A Survey of Methods , 2005, LDV Forum.

[20]  Chris D. Paice,et al.  The identification of important concepts in highly structured technical papers , 1993, SIGIR.

[21]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[22]  Horacio Saggion,et al.  The CONCISUS Corpus of Event Summaries , 2012, LREC.

[23]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[24]  Paul Buitelaar,et al.  Ontology Learning from Text: An Overview , 2005 .

[25]  Nathanael Chambers,et al.  Template-Based Information Extraction without the Templates , 2011, ACL.

[26]  Kalina Bontcheva,et al.  SVM Based Learning System for Information Extraction , 2004, Deterministic and Statistical Methods in Machine Learning.

[27]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[28]  David Yarowsky,et al.  Inducing Information Extraction Systems for New Languages via Cross-language Projection , 2002, COLING.

[29]  Gerald DeJong,et al.  An Overview of the FRUMP System Introduction , 2014 .

[30]  Yorick Wilks,et al.  Designing Adaptive Information Extraction for the Semantic Web in Amilcare , 2003 .

[31]  Horacio Saggion,et al.  Unsupervised Content Discovery from Concise Summaries , 2012, AKBC-WEKEX@NAACL-HLT.