论文信息 - A Feature Selection for Text Categorization on Research Support System Papits

A Feature Selection for Text Categorization on Research Support System Papits

We have developed a research support system, called Papits, that shares research information, such as PDF files of research papers, in computers on the network and classifies the information into types of research fields. Users of Papits can share various research information and survey the corpora of their particular fields of research. In order to realize Papits, we need to design a mechanism for identifying what words are best suited to classify documents in predefined classes. Further we have to consider classification in cases where we must classify documents into multivalued fields and where there is insufficient data for classification. In this paper, we present an implementation method of automatic classification based on a text classification technique for Papits. We also propose a new method for using feature selection to classify documents that are represented by a bag-of-words into a multivalued category. Our method transforms the multivalued category into a binary category to easily identify the characteristic words to classify category in a few training data. Our experimental result indicates that our method can effectively classify documents in Papits.

[1] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.

[2] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[3] Guy W. Mineau,et al. A Simple Feature Selection Method for Text Classification , 2001, IJCAI.

[4] Ron Kohavi,et al. Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[5] Thorsten Joachims,et al. Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[6] Yiming Yang,et al. A re-examination of text categorization methods , 1999, SIGIR '99.

[7] Susan T. Dumais,et al. A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[8] Andrew McCallum,et al. Using Maximum Entropy for Text Classification , 1999 .

[9] Toramatsu Shintani,et al. P2P based knowledge source discovery on research support system papits , 2002, AAMAS '02.