Utiliza\c{c}\~ao de Grafos e Matriz de Similaridade na Sumariza\c{c}\~ao Autom\'atica de Documentos Baseada em Extra\c{c}\~ao de Frases

The internet increased the amount of information available. However, the reading and understanding of this information are costly tasks. In this scenario, the Natural Language Processing (NLP) applications enable very important solutions, highlighting the Automatic Text Summarization (ATS), which produce a summary from one or more source texts. Automatically summarizing one or more texts, however, is a complex task because of the difficulties inherent to the analysis and generation of this summary. This master's thesis describes the main techniques and methodologies (NLP and heuristics) to generate summaries. We have also addressed and proposed some heuristics based on graphs and similarity matrix to measure the relevance of judgments and to generate summaries by extracting sentences. We used the multiple languages (English, French and Spanish), CSTNews (Brazilian Portuguese), RPM (French) and DECODA (French) corpus to evaluate the developped systems. The results obtained were quite interesting.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  S HsuL,et al.  A Survey on Statistical Approaches to Natural Language Processing , 1992 .

[3]  Thorsten Joachims,et al.  Large-Margin Learning of Submodular Summarization Models , 2012, EACL.

[4]  U. Berkeley Exploring Content Models for Multi-Document Summarization , 2018 .

[5]  Eric SanJuan,et al.  Summary Evaluation with and without References , 2010, Polytech. Open Libr. Int. Bull. Inf. Technol. Sci..

[6]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[7]  Michael G. Dyer,et al.  Connectionist Natural Language Processing: A Status Report , 1995 .

[8]  Rafael Dueire Lins,et al.  A multi-document summarization system based on statistics and linguistic treatment , 2014, Expert Syst. Appl..

[9]  Juan-Manuel Torres-Moreno,et al.  Enertex : un système basé sur l’énergie textuelle , 2008, JEPTALNRECITAL.

[10]  Sahin Albayrak,et al.  Personalized Multi-Document Summarization using N-Gram Topic Model Fusion , 2010 .

[11]  L. R. Dice Measures of the Amount of Ecologic Association Between Species , 1945 .

[12]  Frédéric Béchet,et al.  DECODA: a call-centre human-human spoken conversation corpus , 2012, LREC.

[13]  Thiago A. S. Pardo,et al.  Experiments with CST-Based Multidocument Summarization , 2010, TextGraphs@ACL.

[14]  Elena Lloret,et al.  A Comparative Study of the Impact of Statistical and Semantic Features in the Framework of Extractive Text Summarization , 2012, TSD.

[15]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[16]  Ryan T. McDonald Discriminative Sentence Compression with Soft Syntactic Evidence , 2006, EACL.

[17]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[18]  Maria das Graças Volpe Nunes,et al.  Some Experiments on Clustering Similar Sentences of Texts in Portuguese , 2008, PROPOR.

[19]  Florian Boudin,et al.  NEO-CORTEX: A Performant User-Oriented Multi-Document Summarization System , 2007, CICLing.

[20]  Danielle S. McNamara,et al.  Handbook of latent semantic analysis , 2007 .

[21]  Rada Mihalcea,et al.  Graph-based Ranking Algorithms for Sentence Extraction, Applied to Text Summarization , 2004, ACL.

[22]  Eloize Rossi Marques Seno Um método para a fusão automática de sentenças similares em português , 2010 .

[23]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[24]  Antonio Zamora,et al.  Automatic Abstracting Research at Chemical Abstracts Service , 1975, J. Chem. Inf. Comput. Sci..

[25]  Juan-Manuel Torres-Moreno Artex is AnotheR TEXt summarizer , 2012, ArXiv.

[26]  Eduard H. Hovy,et al.  Automated Text Summarization and the SUMMARIST System , 1998, TIPSTER.

[27]  Chin-Yew Lin Training a selection function for extraction , 1999, CIKM '99.

[28]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[29]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[30]  Hui Lin,et al.  Graph-based submodular selection for extractive summarization , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[31]  Sergiy Butenko,et al.  Maximum independent set and related problems, with applications , 2003 .

[32]  Djoerd Hiemstra Probability Smoothing , 2009, Encyclopedia of Database Systems.

[33]  Maria das Graças Volpe Nunes,et al.  Enriquecendo o Córpus CSTNews - a Criação de Novos Sumários Multidocumento , 2014 .

[34]  Katja Filippova,et al.  Multi-Sentence Compression: Finding Shortest Paths in Word Graphs , 2010, COLING.

[35]  Wei Shao,et al.  A New Feature-Fusion Sentence Selecting Strategy for Query-Focused Multi-document Summarization , 2008, 2008 International Conference on Advanced Language Processing and Web Information Technology.

[36]  H. B. McMahan,et al.  Robust Submodular Observation Selection , 2008 .

[37]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[38]  Fabrizio Rossi,et al.  A branch-and-cut algorithm for the maximum cardinality stable set problem , 2001, Oper. Res. Lett..

[39]  Benoit Favre,et al.  A Scalable Global Model for Summarization , 2009, ILP 2009.

[40]  Jean Carletta,et al.  Extractive summarization of meeting recordings , 2005, INTERSPEECH.

[41]  Ion Androutsopoulos,et al.  Extractive Multi-Document Summarization with Integer Linear Programming and Support Vector Regression , 2012, COLING.

[42]  Julia Hirschberg,et al.  From text to speech summarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[43]  Regina Barzilay,et al.  Information Fusion in the Context of Multi-Document Summarization , 1999, ACL.

[44]  Eric SanJuan,et al.  Textual Energy of Associative Memories: Performant Applications of Enertex Algorithm in Text Summarization and Topic Segmentation , 2007, MICAI.

[45]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[46]  Dipanjan Das Andr,et al.  A Survey on Automatic Text Summarization , 2007 .

[47]  Luca Cagliero,et al.  GraphSum: Discovering correlations among multiple terms for graph-based summarization , 2013, Inf. Sci..

[48]  Maria das Graças Volpe Nunes,et al.  GistSumm: A Summarization Tool Based on a New Extractive Method , 2003, PROPOR.

[49]  Juan-Manuel Torres-Moreno,et al.  Condensés de textes par des méthodes numériques , 2012, ArXiv.

[50]  Regina Barzilay,et al.  Sentence Fusion for Multidocument News Summarization , 2005, CL.

[51]  Xiao-Dong Zhang,et al.  MRS for multi-document summarization by sentence extraction , 2013, Telecommun. Syst..

[52]  Jugal K. Kalita,et al.  Comparing Twitter Summarization Algorithms for Multiple Post Summaries , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[53]  Eric SanJuan,et al.  Multilingual Summarization Evaluation without Human Models , 2010, COLING.

[54]  Dragomir R. Radev,et al.  LexRank: Graph-based Lexical Centrality as Salience in Text Summarization , 2004, J. Artif. Intell. Res..

[55]  Anders Søgaard,et al.  Semi-Supervised Learning and Domain Adaptation in Natural Language Processing , 2013, Semi-Supervised Learning and Domain Adaptation in Natural Language Processing.

[56]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.