Incremental context mining for adaptive document classification

Automatic document classification (DC) is essential for the management of information and knowledge. This paper explores two practical issues in DC: (1) each document has its context of discussion, and (2) both the content and vocabulary of the document database is intrinsically evolving. The issues call for adaptive document classification (ADC) that adapts a DC system to the evolving contextual requirement of each document category, so that input documents may be classified based on their contexts of discussion. We present an incremental context mining technique to tackle the challenges of ADC. Theoretical analyses and empirical results show that, given a text hierarchy, the mining technique is efficient in incrementally maintaining the evolving contextual requirement of each category. Based on the contextual requirements mined by the system, higher-precision DC may be achieved with better efficiency.

[1]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[2]  Tom M. Mitchell,et al.  Improving Text Classification by Shrinkage in a Hierarchy of Classes , 1998, ICML.

[3]  P. John Clarkson,et al.  Web-Based Knowledge Management for Distributed Design , 2000, IEEE Intell. Syst..

[4]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[5]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[6]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[7]  W. Bruce Croft,et al.  Combining classifiers in text categorization , 1996, SIGIR '96.

[8]  C. Lee Giles,et al.  Context and Page Analysis for Improved Web Search , 1998, IEEE Internet Comput..

[9]  hierarchyDunja Mladeni Feature Selection for Classiication Based on Text Hierarchy , 1998 .

[10]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[11]  Ellen Riloff,et al.  Information extraction as a basis for high-precision text classification , 1994, TOIS.

[12]  Wai Lam,et al.  Using a generalized instance set for automatic text categorization , 1998, SIGIR '98.

[13]  Takenobu Tokunaga,et al.  Cluster-based text categorization: a comparison of category search strategies , 1995, SIGIR '95.

[14]  I. Nonaka A Dynamic Theory of Organizational Knowledge Creation , 1994 .

[15]  Yoram Singer,et al.  Context-sensitive learning methods for text categorization , 1996, SIGIR '96.

[16]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[17]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[18]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[19]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.