论文信息 - Experiments in Microblog Summarization

Experiments in Microblog Summarization

Abstract —This paper presents algorithms for summarizingmicroblog posts. In particular, our algorithms process collectionsof short posts on speciﬁc topics on the well-known site calledTwitter and create short summaries from these collections ofposts on a speciﬁc topic. The goal is to produce summariesthat are similar to what a human would produce for the samecollection of posts on a speciﬁc topic. We evaluate the summariesproduced by the summarizing algorithms, compare them withhuman-produced summaries and obtain excellent results. I. I NTRODUCTION Twitter, the microblogging site started in 2006, has becomea social phenomenon, with more than 20 million visitors eachmonth. While the majority posts are conversational or notvery meaningful, about 3.6% of the posts concern topics ofmainstream news 1 . At the end of 2009, Twitter had 75 millionaccount holders, of which about 20% are active 2 . There areapproximately 2.5 million Twitter posts per day 3 . To helppeople who read Twitter posts or tweets, Twitter provides ashort list of popular topics called

Jugal K. Kalita | Beaux Sharifi | Mark-Anthony Hutton

[1] Stephen Wan,et al. Generating Overview Summaries of Ongoing Email Thread Discussions , 2004, COLING.

[2] Robert G. Farrell,et al. Summarizing electronic discourse , 2002, Intell. Syst. Account. Finance Manag..

[3] Johannes Gehrke,et al. CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[4] Jugal Kalita,et al. A response to the need for summary responses , 1984 .

[5] Inderjeet Mani,et al. The Challenges of Automatic Summarization , 2000, Computer.

[6] Eduard H. Hovy,et al. Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[7] J. Kalita,et al. Automatic Summarization of Twitter Topics , 2010 .

[8] Yohei Seki,et al. Sentence Extraction by tf/idf and Position Weighting from Newspaper Articles , 2002, NTCIR.

[9] Rada Mihalcea,et al. TextRank: Bringing Order into Text , 2004, EMNLP.

[10] Dragomir R. Radev,et al. Centroid-based summarization of multiple documents , 2004, Inf. Process. Manag..

[11] Ken Lang,et al. NewsWeeder: Learning to Filter Netnews , 1995, ICML.