A Fast Algorithm for Hierarchical Text Classification

Text classification is becoming more important with the proliferation of the Internet and the huge amount of data it transfers. We present an efficient algorithm for text classification using hierarchical classifiers based on a concept hierarchy. The simple TFIDF classifier is chosen to train sample data and to classify other new data. Despite its simplicity, results of experiments on Web pages and TV closed captions demonstrate high classification accuracy. Application of feature subset selection techniques improves the performance. Our algorithm is computationally efficient being bounded by O(n log n) for n samples.

[1]  Susan Brewer,et al.  Information storage and retrieval , 1959, ACM '59.

[2]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[3]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[4]  Prabhakar Raghavan,et al.  Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases , 1997, VLDB.

[5]  hierarchyDunja Mladeni,et al.  Feature Selection for Classiication Based on Text Hierarchy , 1998 .

[6]  Tom M. Mitchell,et al.  Learning to Extract Symbolic Knowledge from the World Wide Web , 1998, AAAI/IAAI.

[7]  Daphne Koller,et al.  Using machine learning to improve information access , 1998 .

[8]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[9]  W. Bruce Croft,et al.  Deriving concept hierarchies from text , 1999, SIGIR '99.

[10]  Andrew McCallumzy,et al.  Building Domain-speciic Search Engines with Machine Learning Techniques , 1999 .

[11]  Dunja Mladenic,et al.  Text-learning and related intelligent agents: a survey , 1999, IEEE Intell. Syst..

[12]  Andrew McCallum,et al.  A Machine Learning Approach to Building Domain-Specific Search Engines , 1999, IJCAI.

[13]  Jason D. M. Rennie,et al.  Building Domain-Speci c Search Engines with Machine Learning Techniques , 1999 .

[14]  Andrew McCallum,et al.  Building Domain-Specific Search Engines with Machine Learning Techniques , 1999 .