How to Improve Text Summarization and Classification by Mutual Cooperation on an Integrated Framework

An effective integrated framework using both of summary and category information.The summarization technique utilizes the category information from classification.The classification technique utilizes the summary information from summarization.This integrated framework achieves significant improvement. Text summarization and classification are core techniques to analyze a huge amount of text data in the big data environment. Moreover, as the need to read texts on smart phones, tablets and television as well as personal computers continues to grow, text summarization and classification techniques become more important and both of them do essential processes for text analysis in many applications.Traditional text summarization and classification techniques have individually been considered as different research fields in this literature. However, we find out that they can help each other as text summarization makes use of category information from text classification and text classification does summary information from text summarization. Therefore, we propose an effective integrated learning framework using both of summary and category information in this paper. In this framework, the feature-weighting method for text summarization utilizes a language model to combine feature distributions in each category and text, and one for text classification does the sentence importance scores estimated from the text summarization.In the experiments, the performances of the integrated framework are better than ones of individual text summarization and classification. In addition, the framework has some advantages of easy implementation and language independence because it is based on only simple statistical approaches and POS tagger.

[1]  Ani Nenkova,et al.  Automatic Summarization , 2011, ACL.

[2]  Jae-Hoon Kim,et al.  Korean text summarization using an aggregate similarity , 2000, IRAL '00.

[3]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[4]  Tao Li,et al.  Ontology-enriched multi-document summarization in disaster management , 2010, SIGIR.

[5]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[6]  D. Mitrany Methodology of the Social Sciences , 1945, Nature.

[7]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[8]  Mark Wasson,et al.  Using Leading Text for News Summaries: Evaluation Results and Implications for Commercial Summarization Applications , 1998, ACL.

[9]  Mitsuru Ishizuka,et al.  Keyword extraction from a single document using word co-occurrence statistical information , 2004, Int. J. Artif. Intell. Tools.

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  Taeho Jo,et al.  Representation of Texts into String Vectors for Text Categorization , 2010, J. Comput. Sci. Eng..

[12]  Klaus Zechner,et al.  Fast Generation of Abstracts from General Domain Text Corpora by Extracting Relevant Sentences , 1996, COLING.

[13]  Hiroya Takamura,et al.  Text Summarization Model based on Maximum Coverage Problem and its Variant , 2008 .

[14]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[15]  Jinwoo Park,et al.  Improving text categorization using the importance of sentences , 2004, Inf. Process. Manag..

[16]  Vasileios Hatzivassiloglou,et al.  A Formal Model for Information Selection in Multi-Sentence Text Extraction , 2004, COLING.

[17]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[18]  W. Bruce Croft,et al.  Cluster-based retrieval using language models , 2004, SIGIR '04.

[19]  Clement T. Yu,et al.  Precision Weighting—An Effective Automatic Indexing Method , 1976, J. ACM.

[20]  Masaaki Nagata,et al.  Single-Document Summarization as a Tree Knapsack Problem , 2013, EMNLP.

[21]  Frederick Jelinek,et al.  Interpolated estimation of Markov source parameters from sparse data , 1980 .

[22]  D. S. Guru,et al.  Symbolic representation of text documents , 2010, Bangalore Compute Conf..

[23]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[24]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[25]  Lucy Vanderwende,et al.  Enhancing Single-Document Summarization by Combining RankNet and Third-Party Sources , 2007, EMNLP.

[26]  Dino Isa,et al.  Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine , 2008, IEEE Transactions on Knowledge and Data Engineering.

[27]  A. Narayanan Maximum Likelihood Estimation of the Parameters of the Dirichlet Distribution , 1991 .

[28]  Vibhu O. Mittal,et al.  OCELOT: a system for summarizing Web pages , 2000, SIGIR '00.

[29]  Zhang Le,et al.  Maximum Entropy Modeling Toolkit for Python and C , 2004 .

[30]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[31]  Sanda M. Harabagiu,et al.  Generating Single and Multi-Document Summaries with GIST EXTER , 2002 .

[32]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[33]  Youngjoong Ko,et al.  An effective sentence-extraction technique using contextual information and statistical approaches for text summarization , 2008, Pattern Recognition Letters.

[34]  Dragomir R. Radev,et al.  Introduction to the Special Issue on Summarization , 2002, CL.

[35]  Sebastian Thrun,et al.  Learning to Classify Text from Labeled and Unlabeled Documents , 1998, AAAI/IAAI.

[36]  John D. Lafferty,et al.  Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Jeroen K. Vermunt Latent Class Model , 2010, Encyclopedia of Machine Learning.

[38]  Youngjoong Ko,et al.  Efficient Keyword Extraction and Text Summarization for Reading Articles on Smart Phone , 2016, Comput. Informatics.

[39]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[40]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..