Using the Omega Index for Evaluating Abstractive Community Detection

Numerous NLP tasks rely on clustering or community detection algorithms. For many of these tasks, the solutions are disjoint, and the relevant evaluation metrics assume nonoverlapping clusters. In contrast, the relatively recent task of abstractive community detection (ACD) results in overlapping clusters of sentences. ACD is a sub-task of an abstractive summarization system and represents a twostep process. In the first step, we classify sentence pairs according to whether the sentences should be realized by a common abstractive sentence. This results in an undirected graph with sentences as nodes and predicted abstractive links as edges. The second step is to identify communities within the graph, where each community corresponds to an abstractive sentence to be generated. In this paper, we describe how the Omega Index, a metric for comparing non-disjoint clustering solutions, can be used as a summarization evaluation metric for this task. We use the Omega Index to compare and contrast several community detection algorithms.

[1]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[2]  Mason A. Porter,et al.  Communities in Networks , 2009, ArXiv.

[3]  L. Collins,et al.  Omega: A General Formulation of the Rand Index of Cluster Recovery Suitable for Non-disjoint Solutions. , 1988, Multivariate behavioral research.

[4]  Igor Malioutov,et al.  Minimum Cut Model for Spoken Lecture Segmentation , 2006, ACL.

[5]  Steve Gregory,et al.  A Fast Algorithm to Find Overlapping Communities in Networks , 2008, ECML/PKDD.

[6]  Jean Carletta,et al.  Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus , 2007, Lang. Resour. Evaluation.

[7]  Elizabeth Shriberg,et al.  The ICSI Meeting Recorder Dialog Act (MRDA) Corpus , 2004, SIGDIAL Workshop.

[8]  Dragomir R. Radev,et al.  Experiments in Single and Multi-Document Summarization Using MEAD , 2001 .

[9]  Inderjeet Mani,et al.  Summarization Evaluation: An Overview , 2001, NTCIR.

[10]  Shafiq R. Joty,et al.  Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails , 2010, EMNLP.

[11]  James H. Martin,et al.  Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition, 2nd Edition , 2000, Prentice Hall series in artificial intelligence.

[12]  Inderjeet Mani,et al.  The Tipster Summac Text Summarization Evaluation , 1999, EACL.

[13]  Julia Hirschberg,et al.  Identifying Agreement and Disagreement in Conversational Speech: Use of Bayesian Networks to Model Pragmatic Dependencies , 2004, ACL.

[14]  Simone Teufel,et al.  Sentence extraction as a classification task , 1997 .

[15]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[16]  Mark T. Maybury,et al.  Automatic Summarization , 2002, Computational Linguistics.

[17]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Steve Gregory,et al.  An Algorithm to Find Overlapping Community Structure in Networks , 2007, PKDD.

[19]  Ani Nenkova,et al.  Automatic Summarization , 2011, ACL.

[20]  Regina Barzilay,et al.  Sentence Fusion for Multidocument News Summarization , 2005, CL.

[21]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[22]  Giuseppe Carenini,et al.  Generating and Validating Abstracts of Meeting Conversations: a User Study , 2010, INLG.

[23]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.