Summarization of Online Document Repositories

As websites on the Internet in the Web 2.0 era have become more interactive,there has been an explosion of new user-generated content. The goal of Summarization Pipeline for Online Repositories of Knowledge(SPORK) is to be able to identify important key topics presentedin multi-document texts, such as online comment threads. While most otherautomatic summarization systems simply focus on finding the top sentences representedin the text, SPORK separates the text into clusters, and identifies differenttopics and opinions presented in the text. SPORK has shown results ofmanaging to identify 72% of key topics present in any discussion and up to 80%of key topics in a well-structured discussion.