A supervised learning method combine with dimensionality reduction in Vietnamese text summarization

The World Wide Web has brought us a vast amount of online information. When we search with a keyword, data feedback from many different websites and the user cannot read all the information. So that, text summarization has become a hot topic, it has attracted experts in data mining and natural language processing field. For Vietnamese, some methods of text summarization based on that have been proposed for English also bring some significant results. However, still remain some difficult problems to treat with the Vietnamese language processing, typical in this is the Vietnamese text segmentation tool and text summarization corpus. In this paper, we present a Vietnamese text summarization method based on sentence extraction approach using neural network for learning combine reducing dimensional features to overcome the cost when building term sets and reduce the computational complexity. The experimental results show that our method is really effective in reducing computational complexity, and is better than some methods that have been proposed previous.

[1]  Ronald Rosenfeld,et al.  Semi-supervised learning with graphs , 2005 .

[2]  Yuji Matsumoto,et al.  Extracting Important Sentences with Support Vector Machines , 2002, COLING.

[3]  Lucy Vanderwende,et al.  Enhancing Single-Document Summarization by Combining RankNet and Third-Party Sources , 2007, EMNLP.

[4]  Daniel Marcu,et al.  Summarization beyond sentence extraction: A probabilistic approach to sentence compression , 2002, Artif. Intell..

[5]  Minh Le. Nguyen Statistical machine learning approaches to cross language text summarization , 2004 .

[6]  Chi Mai Luong,et al.  Title A Primary Study on Summarization of Documents in Vietnamese , 2011 .

[7]  Hitoshi Isahara,et al.  A Summarization System with Categorization of Document Sets , 2002, NTCIR.

[8]  Leila Sharif Hassanabadi,et al.  Summarising text with a genetic algorithm-based sentence extraction , 2008 .

[9]  Gökhan Tür,et al.  Statistical Sentence Extraction for Information Distillation , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[10]  Eduard H. Hovy,et al.  Automated Text Summarization and the SUMMARIST System , 1998, TIPSTER.

[11]  Sadaoki Furui,et al.  Sentence extraction-based presentation summarization techniques and evaluation metrics , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[12]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[13]  Eduard H. Hovy,et al.  Automatic Evaluation of Summaries Using N-gram Co-occurrence Statistics , 2003, NAACL.

[14]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[15]  Seiichi Nakagawa,et al.  Automatic extraction of cue phrases for important sentences in lecture speech and automatic lecture speech summarization , 2007, INTERSPEECH.

[16]  Xiaojin Zhu,et al.  New directions in semi-supervised learning , 2010 .

[17]  Josef Steinberger,et al.  Automatic Text Summarization (The state of the art 2007 and new challenges) , 2008 .

[18]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[19]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[20]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[21]  Gabriella Vigliocco,et al.  The Hidden Markov Topic Model: A Probabilistic Model of Semantic Representation , 2010, Top. Cogn. Sci..

[22]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[23]  Viet Nam,et al.  A Novel Application of Fuzzy Set Theory and Topic Model in Sentence Extraction for Vietnamese Text , 2010 .

[24]  Miles Osborne,et al.  Using maximum entropy for sentence extraction , 2002, ACL 2002.

[25]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.