Keyword extraction using backpropagation neural networks and rule extraction

Keyword extraction is vital for Knowledge Management System, Information Retrieval System, and Digital Libraries as well as for general browsing of the web. Keywords are often the basis of document processing methods such as clustering and retrieval since processing all the words in the document can be slow. Common models for automating the process of keyword extraction are usually done by using several statistics-based methods such as Bayesian, K-Nearest Neighbor, and Expectation-Maximization. These models are limited by word-related features that can be used since adding more features will make the models more complex and difficult to comprehend. In this research, a Neural Network, specifically a backpropagation network, will be used in generalizing the relationship of the title and the content of articles in the archive by following word features other than TF-IDF, such as position of word in the sentence, paragraph, or in the entire document, and formats such as heading, and other attributes defined beforehand. In order to explain how the backpropagation network works, a rule extraction method will be used to extract symbolic data from the resulting backpropagation network. The rules extracted can then be transformed into decision trees performing almost as accurate as the network plus the benefit of being in an easily comprehensible format.

[1]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[2]  Chin-Chuan Han,et al.  GA Based Optimal Keyword Extraction in an Automatic Chinese Web Document Classification System , 2007, ISPA Workshops.

[3]  Bart Baesens,et al.  Recursive Neural Network Rule Extraction for Data With Mixed Attributes , 2008, IEEE Transactions on Neural Networks.

[4]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[5]  Chun Chen,et al.  A Novel Approach to Keyword Extraction for Contextual Advertising , 2009, 2009 First Asian Conference on Intelligent Information and Database Systems.

[6]  Arnulfo P. Azcarraga,et al.  Extracting meaningful labels for WEBSOM text archives , 2001, CIKM '01.

[7]  R. Setiono,et al.  Effective neural network pruning using cross-validation , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[8]  Juan-Zi Li,et al.  Keyword Extraction Using Support Vector Machine , 2006, WAIM.

[9]  Rong Jin,et al.  Title Generation for Machine-Translated Documents , 2001, IJCAI.

[10]  Chunguo Wu,et al.  Data Preprocessing in SVM-Based Keywords Extraction from Scientific Documents , 2009, 2009 Fourth International Conference on Innovative Computing, Information and Control (ICICIC).

[11]  Taeho Jo,et al.  Keyword Extraction from Documents Using a Neural Network Model , 2006, 2006 International Conference on Hybrid Information Technology.

[12]  Taeho Jo Neural Based Approach to Keyword Extraction from Documents , 2003, ICCSA.

[13]  Rudy Setiono,et al.  Generating Concise Sets of Linear Regression Rules from Artificial Neural Networks , 2002, Int. J. Artif. Intell. Tools.

[14]  Huan Liu,et al.  Understanding Neural Networks via Rule Extraction , 1995, IJCAI.

[15]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.