Automatic Arabic Text Summarization Using Analogical Proportions

Automatic text summarization is the process of generating or extracting a brief representation of an input text. There are several algorithms for extractive summarization in the literature tested by using English and other languages datasets; however, only few extractive Arabic summarizers exist due to the lack of large collection in Arabic language. This paper proposes and assesses new extractive single-document summarization approaches based on analogical proportions which are statements of the form “a is to b as c is to d”. The goal is to study the capability of analogical proportions to represent the relationship between documents and their corresponding summaries. For this purpose, we suggest two algorithms to quantify the relevance/irrelevance of an extracted keyword from the input text, to build its summary. In the first algorithm, the analogical proportion representing this relationship is limited to check the existence/non-existence of the keyword in any document or summary in a binary way without considering keyword frequency in the text, whereas the analogical proportion of the second algorithm considers this frequency. We have assessed and compared these two algorithms with some language-independent summarizers (LexRank, TextRank, Luhn and LSA (Latent Semantic Analysis)) using our large corpus ANT (Arabic News Texts) and a small test collection EASC (Essex Arabic Summaries Corpus) by computing ROUGE (Recall-Oriented Understudy for Gisting Evaluation) and BLEU (BiLingual Evaluation Understudy) metrics. The best-achieved results are ROUGE-1 = 0.96 and BLEU-1 = 0.65 corresponding to educational documents from EASC collection which outperform the best LexRank algorithm. The proposed algorithms are also compared with three other Arabic extractive summarizers, using EASC collection, and show better results in terms of ROUGE-1 = 0.75 and BLEU-1 = 0.47 for the first algorithm, and ROUGE-1 = 0.74 and BLEU-1 = 0.49 for the second one. Experimental results show the interest of analogical proportions for text summarization. In particular, analogical summarizers significantly outperform three among four language-independent summarizers in the case of BLEU-1 for ANT collection and they are not significantly outperformed by any other summarizer in the case of EASC collection.

[1]  Gilles Richard,et al.  Multiple-valued extensions of analogical proportions , 2016, Fuzzy Sets Syst..

[2]  Henri Prade,et al.  Handling Analogical Proportions in Classical Logic and Fuzzy Logics Settings , 2009, ECSQARU.

[3]  Philippe Langlais Étude Quantitative De Liens Entre L’analogie Formelle Et La Morphologie Constructionnelle , 2009 .

[4]  Gilles Richard,et al.  Enforcing regularity by means of analogy-related proportions – A new approach to classification , 2011 .

[5]  Henri Prade,et al.  Comparison of Analogy-Based Methods for Predicting Preferences , 2019, SUM.

[6]  Laurent Miclet,et al.  Analogical Dissimilarity: Definition, Algorithms and Two Experiments in Machine Learning , 2008, J. Artif. Intell. Res..

[7]  Gilles Richard,et al.  From Analogical Proportion to Logical Proportions , 2013, Logica Universalis.

[8]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[9]  François Yvon,et al.  Analogical Learning and Formal Proportions : Definitions and Methodological Issues Apprentissage par analogie et proportions formelles : définitions et aspects méthodologiques , 2005 .

[10]  Thierry Poibeau,et al.  Automatic Text Summarization: Past, Present and Future , 2013, Multi-source, Multilingual Information Extraction and Summarization.

[11]  François Yvon,et al.  An Analogical Learner for Morphological Analysis , 2005, CoNLL.

[12]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[13]  Udo Kruschwitz,et al.  Using Mechanical Turk to Create a Corpus of Arabic Summaries , 2010 .

[14]  Bilel Elayeb,et al.  ANT Corpus: An Arabic News Text Collection for Textual Classification , 2017, 2017 IEEE/ACS 14th International Conference on Computer Systems and Applications (AICCSA).

[15]  Ani Nenkova,et al.  A Survey of Text Summarization Techniques , 2012, Mining Text Data.

[16]  Vishal Gupta,et al.  A Novel Hybrid Text Summarization System for Punjabi Text , 2015, Cognitive Computation.

[17]  Rada Mihalcea,et al.  A Language Independent Algorithm for Single and Multiple Document Summarization , 2005, IJCNLP.

[18]  François Yvon,et al.  Du quatrième de proportion comme principe inductif : une proposition et son application à l’apprentissage de la morphologie [Inference with formal analogical proportions: application to the automatic learning of morphology] , 2006, TAL.

[19]  Udo Kruschwitz,et al.  Exploring Clustering for Multi-document Arabic Summarisation , 2011, AIRS.

[20]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[21]  Eyke Hüllermeier,et al.  Learning to Rank based on Analogical Reasoning , 2017, AAAI.

[22]  Fatima T. AL-Khawaldeh Lexical Cohesion and Entailment based Segmentation for Arabic Text Summarization ( LCEAS ) , 2015 .

[23]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[24]  S. S. Ismail Representation using Rich Semantic Graph : A Case Study , 2013 .

[25]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[26]  Qasem A. Al-Radaideh,et al.  Rough Set Theory for Arabic Sentiment Classification , 2014, 2014 International Conference on Future Internet of Things and Cloud.

[27]  Qasem A. Al-Radaideh,et al.  A Hybrid Approach for Arabic Text Summarization Using Domain Knowledge and Genetic Algorithms , 2018, Cognitive Computation.

[28]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[29]  Philippe Blache,et al.  Automatic summarization of Semitic languages , 2014 .

[30]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[31]  Mourad Oussalah,et al.  SRL-ESA-TextSum: A text summarization approach based on semantic role labeling and explicit semantic analysis , 2019, Inf. Process. Manag..

[32]  Rasim M. Alguliyev,et al.  Evolutionary Algorithm for Extractive Text Summarization , 2009, Intell. Inf. Manag..

[33]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[34]  Elizabeth León Guzman,et al.  Extractive single-document summarization based on genetic operators and guided local search , 2014, Expert Syst. Appl..

[35]  Laurent Miclet,et al.  Learning by Analogy: A Classification Rule for Binary and Nominal Data , 2007, IJCAI.

[36]  Erik Cambria,et al.  Targeted Aspect-Based Sentiment Analysis via Embedding Commonsense Knowledge into an Attentive LSTM , 2018, AAAI.

[37]  Thierry Poibeau,et al.  Multi-source, Multilingual Information Extraction and Summarization , 2012, Theory and Applications of Natural Language Processing.

[38]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[39]  Erik Cambria,et al.  A review of sentiment analysis research in Arabic language , 2020, Future Gener. Comput. Syst..

[40]  Aziz Qaroush,et al.  An efficient single document Arabic text summarization using a combination of statistical and semantic features , 2019, J. King Saud Univ. Comput. Inf. Sci..

[41]  Ahmad T. Al-Taani,et al.  Arabic Single-Document Text Summarization Using Particle Swarm Optimization Algorithm , 2017, ACLING.

[42]  Ahmed Guessoum,et al.  A Supervised Approach to Arabic Text Summarization Using AdaBoost , 2015, WorldCIST.

[43]  Nazlia Omar,et al.  Automatic multi-document Arabic text summarization using clustering and keyphrase extraction , 2015 .

[44]  Raymond Chiong,et al.  Multilingual sentiment analysis: from formal to informal and scarce resource languages , 2016, Artificial Intelligence Review.

[45]  Gurpreet Singh Lehal,et al.  A Survey of Text Summarization Extractive Techniques , 2010 .

[46]  Myriam Bounhas,et al.  Analogy-based Matching Model for Domain-specific Information Retrieval , 2019, ICAART.

[47]  Mohamed El Bachir Menai,et al.  Automatic Arabic text summarization: a survey , 2015, Artificial Intelligence Review.

[48]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[49]  Henri Prade,et al.  Continuous Analogical Proportions-Based Classifier , 2020, IPMU.

[50]  Luca Cagliero,et al.  GraphSum: Discovering correlations among multiple terms for graph-based summarization , 2013, Inf. Sci..

[51]  Min Yang,et al.  Towards Scalable and Reliable Capsule Networks for Challenging NLP Applications , 2019, ACL.

[52]  Yves Lepage,et al.  Analogy and Formal Languages , 2004, FGMOL.

[53]  Vincent Claveau,et al.  Automatic Morphological Query Expansion Using Analogy-Based Machine Learning , 2007, ECIR.

[54]  Aqil M. Azmi,et al.  An abstractive Arabic text summarizer with user controlled granularity , 2018, Inf. Process. Manag..

[55]  Krys J. Kochut,et al.  Text Summarization Techniques: A Brief Survey , 2017, International Journal of Advanced Computer Science and Applications.

[56]  Dianne P. O'Leary,et al.  Text summarization via hidden Markov models , 2001, SIGIR '01.

[57]  Imed Zitouni,et al.  Natural Language Processing of Semitic Languages , 2014, Theory and Applications of Natural Language Processing.

[58]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[59]  M. Aref,et al.  Semantic graph reduction approach for abstractive Text Summarization , 2012, 2012 Seventh International Conference on Computer Engineering & Systems (ICCES).

[60]  Peng Shi,et al.  Enhancing sentence-level clustering with ranking-based clustering framework for theme-based summarization , 2014, Inf. Sci..

[61]  Bilel Elayeb,et al.  A TF-IDF and Co-occurrence Based Approach for Events Extraction from Arabic News Corpus , 2018, NLDB.

[62]  Udo Kruschwitz,et al.  Multi-document arabic text summarisation , 2011, 2011 3rd Computer Science and Electronic Engineering Conference (CEEC).

[63]  Mary B. Hesse V — On Defining Analogy , 1960 .

[64]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[65]  Nizar Habash,et al.  Introduction to Arabic Natural Language Processing , 2010, Introduction to Arabic Natural Language Processing.

[66]  Tarek El-Shishtawy,et al.  Keyphrase based Arabic summarizer (KPAS) , 2012, 2012 8th International Conference on Informatics and Systems (INFOS).

[67]  Gilles Richard,et al.  Analogy-based classifiers for nominal or numerical data , 2017, Int. J. Approx. Reason..

[68]  Eduard Hovy,et al.  Manual and automatic evaluation of summaries , 2002, ACL 2002.

[69]  Henri Prade,et al.  An Analogical Interpolation Method for Enlarging a Training Dataset , 2019, SUM.

[70]  Nabil Hathout,et al.  Acquistion of the Morphological Structure of the Lexicon Based on Lexical Similarity and Formal Analogy , 2008, COLING 2008.

[71]  Aqil M. Azmi,et al.  A text summarizer for Arabic , 2012, Comput. Speech Lang..

[72]  Bilel Elayeb,et al.  Related Terms Extraction from Arabic News Corpus Using Word Embedding , 2018, OTM Workshops.

[73]  Ahmed Ibrahim,et al.  Improve the Automatic Summarization of Arabic Text Depending on Rhetorical Structure Theory , 2013, 2013 12th Mexican International Conference on Artificial Intelligence.

[74]  Kam-Fai Wong,et al.  Natural Language Processing - IJCNLP 2005, Second International Joint Conference, Jeju Island, Korea, October 11-13, 2005, Proceedings , 2005, IJCNLP.

[75]  Paolo Rosso,et al.  Automatic Text Summarization based on Betweenness Centrality , 2018, CERI.

[76]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[77]  Gilles Richard,et al.  Reasoning with Logical Proportions , 2010, KR.

[78]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.