Arabic summarization in Twitter social network

Abstract Twitter, an online micro blogs, enables its users to write and read text-based posts known as “tweets”. It became one of the most commonly used social networks. However, an important problem arises is that the returned tweets, when searching for a topic phrase, are only sorted by recency not relevancy. This makes the user to manually read through the tweets in order to understand what are primarily saying about the particular topic. Some strategies were developed for summarizing English micro blogs but Arabic micro blogs summarization is still an active research area. This paper presents a machine learning based solution for summarizing Arabic micro blogging posts and more specifically Egyptian dialect summarization. The goal is to produce short summary for Arabic tweets related to a specific topic in less time and effort. The proposed strategy is evaluated and the results are compared with that obtained by the well-known multi-document summarization algorithms including; SumBasic, TF-IDF, PageRank, MEAD, and human summaries.

[1]  Josef Steinberger,et al.  Automatic Text Summarization (The state of the art 2007 and new challenges) , 2008 .

[2]  Riyad Al-Shalabi,et al.  Improving KNN Arabic Text Classification with N-Grams Based Document Indexing , 2008 .

[3]  Dragomir R. Radev,et al.  LexRank: Graph-based Centrality as Salience in Text Summarization , 2004 .

[4]  Jugal K. Kalita,et al.  Experiments in Microblog Summarization , 2010, 2010 IEEE Second International Conference on Social Computing.

[5]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[6]  Yong Wang,et al.  Using Model Trees for Classification , 1998, Machine Learning.

[7]  Huilin Wang,et al.  Calculating Statistical Similarity between Sentences , 2011 .

[8]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[9]  Wai Lam,et al.  MEAD - A Platform for Multidocument Multilingual Text Summarization , 2004, LREC.

[10]  Hassan Mathkour,et al.  Towards a Rhetorical Parsing of Arabic Text , 2005, International Conference on Computational Intelligence for Modelling, Control and Automation and International Conference on Intelligent Agents, Web Technologies and Internet Commerce (CIMCA-IAWTIC'06).

[11]  Ani Nenkova,et al.  Beyond SumBasic: Task-focused summarization with sentence simplification and lexical expansion , 2007, Information Processing & Management.

[12]  Ashutosh Kumar Singh,et al.  A Comparative Study of Page Ranking Algorithms for Information Retrieval , 2009 .

[13]  Xiao-Chen Ma,et al.  Multi-Document Summarization Using Clustering Algorithm , 2009, 2009 International Workshop on Intelligent Systems and Applications.

[14]  Udo Kruschwitz,et al.  Multi-document arabic text summarisation , 2011, 2011 3rd Computer Science and Electronic Engineering Conference (CEEC).

[15]  Jugal K. Kalita,et al.  Comparing Twitter Summarization Algorithms for Multiple Post Summaries , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[16]  Fahad Alotaiby,et al.  Automatic Headline Generation using Character Cross-Correlation , 2011, ACL.

[17]  Aqil M. Azmi,et al.  Ikhtasir — A user selected compression ratio Arabic text summarization system , 2009, 2009 International Conference on Natural Language Processing and Knowledge Engineering.

[18]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[19]  James Caverlee,et al.  Summarizing User-Contributed Comments , 2011, ICWSM.

[20]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..