Insertion of Ontological Knowledge to Improve Automatic Summarization Extraction Methods

The vast availability of information sources has created a need for research on automatic summarization. Current methods perform either by extraction or abstraction. The extraction methods are interesting, because they are robust and independent of the language used. An extractive summary is obtained by selecting sentences of the original source based on information content. This selection can be automated using a classification function induced by a machine learning algorithm. This function classifies sentences into two groups: important or non-important. The important sentences then form the summary. But, the efficiency of this function directly depends on the used training set to induce it. This paper proposes an original way of optimizing this training set by inserting lexemes obtained from ontological knowledge bases. The training set optimized is reinforced by ontological knowledge. An experiment with four machine learning algorithms was made to validate this proposition. The improvement achieved is clearly significant for each of these algorithms.

[1]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[2]  Richard Bellman,et al.  Algorithms, graphs, and computers , 2012 .

[3]  Robert Wetzker,et al.  An Ontology-Based Approach to Text Summarization , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[4]  Wataru Ohyama,et al.  Accuracy Improvement of Automatic Text Classification Based on Feature Transformation and Multi-classifier Combination , 2004, AWCC.

[5]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[6]  Inderjeet Mani Recent developments in text summarization , 2001, CIKM '01.

[7]  Gene H. Golub,et al.  Calculating the singular values and pseudo-inverse of a matrix , 2007, Milestones in Matrix Computation.

[8]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[9]  Sotiris Kotsiantis,et al.  Text Classification Using Machine Learning Techniques , 2005 .

[10]  Inderjeet Mani,et al.  Machine Learning of Generic and User-Focused Summarization , 1998, AAAI/IAAI.

[11]  Richard Bellman,et al.  Introduction to Matrix Analysis , 1972 .

[12]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[13]  Karen Spärck Jones Automatic summarising: The state of the art , 2007, Inf. Process. Manag..

[14]  Susan Brewer,et al.  Information storage and retrieval , 1959, ACM '59.

[15]  Eréndira Rendón Lara,et al.  Text Summarization by Sentence Extraction Using Unsupervised Learning , 2008, MICAI.

[16]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[17]  Lucila Ohno-Machado,et al.  The use of receiver operating characteristic curves in biomedical informatics , 2005, J. Biomed. Informatics.

[18]  Ricco Rakotomalala,et al.  TANAGRA : un logiciel gratuit pour l'enseignement et la recherche , 2005, EGC.