Hierarchical Text Classification

The vast quantity of information on the Web has resulted in the proliferation of topic hierarchies that allow users to browse, rather than search, for Web pages. In our research on conceptual retrieval, we classify documents during indexing so that they can later be retrieved by a combination of keyword and conceptual match. These and other applications have created a need for tools that automatically classify new documents with respect to such hierarchies. Most approaches use flat classifiers that ignore the hierarchical structure, treating each topic as a separate class. Although these flat classifiers are computationally simple, they fail to exploit the information inherent in the structural relationship between topics. This paper explores the use of hierarchical structure for classifying a large, heterogeneous collection of Web content. Use of the hierarchical structure during classification has resulted in a significant improvement of 45.4% in exact match precision when compared with a flat classifier.