Deep classifier: automatically categorizing search results into large-scale hierarchies

Organizing Web search results into hierarchical categories facilitates users' browsing through Web search results, especially for ambiguous queries where the potential results are mixed together. Previous methods on search result classification are usually based on pre-training a classification model on some fixed and shallow hierarchical categories, where only the top-two-level categories of a Web taxonomy is used. Such classification methods may be too coarse for users to browse, since most search results would be classified into only two or three shallow categories. Instead, a deep hierarchical classifier must provide many more categories. However, the performance of such classifiers is usually limited because their classification effectiveness can deteriorate rapidly at the third or fourth level of a hierarchy. In this paper, we propose a novel algorithm known as Deep Classifier to classify the search results into detailed hierarchical categories with higher effectiveness than previous approaches. Given the search results in response to a query, the algorithm first prunes a wide-ranged hierarchy into a narrow one with the help of some Web directories. Different strategies are proposed to select the training data by utilizing the hierarchical structures. Finally, a discriminative naíve Bayesian classifier is developed to perform efficient and effective classification. As a result, the algorithm can provide more meaningful and specific class labels for search result browsing than shallow style of classification. We conduct experiments to show that the Deep Classifier can achieve significant improvement over state-of-the-art algorithms. In addition, with sufficient off-line preparation, the efficiency of the proposed algorithm is suitable for online application

[1]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[2]  Dunja Mladenic,et al.  Turning {{\sc Yahoo!}}\ into an automatic Web page classifier , 1998 .

[3]  Yiming Yang,et al.  A scalability analysis of classifiers in text categorization , 2003, SIGIR.

[4]  Susan T. Dumais,et al.  Bringing order to the Web: automatically categorizing search results , 2000, CHI.

[5]  Hermann Ney,et al.  Reversing and Smoothing the Multinomial Naive Bayes Text Classifier , 2002, PRIS.

[6]  Susan T. Dumais,et al.  Hierarchical classification of Web content , 2000, SIGIR '00.

[7]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[8]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[9]  Dunja Mladenic,et al.  Turning Yahoo to Automatic Web-Page Classifier , 1998, European Conference on Artificial Intelligence.

[10]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[11]  Marti A. Hearst Clustering versus faceted categories for information exploration , 2006, Commun. ACM.

[12]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[13]  Ee-Peng Lim,et al.  Hierarchical text classification and evaluation , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[14]  Michael Granitzer,et al.  Hierarchical Text Classication using Methods from Machine Learning , 2003 .

[15]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[16]  Clement T. Yu,et al.  Personalized web search by mapping user queries to categories , 2002, CIKM '02.

[17]  Prabhakar Raghavan,et al.  Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies , 1998, The VLDB Journal.

[18]  Wolfgang Nejdl,et al.  Using ODP metadata to personalize search , 2005, SIGIR '05.

[19]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[20]  Yiming Yang,et al.  Support vector machines classification with a very large-scale taxonomy , 2005, SKDD.

[21]  Ben Shneiderman,et al.  Categorized graphical overviews for web search results: An exploratory study using U. S. government agencies as a meaningful and stable structure , 2004 .

[22]  Jennifer Widom,et al.  Exploiting hierarchical domain structure to compute similarity , 2003, TOIS.

[23]  Ophir Frieder,et al.  Using manually-built web directories for automatic evaluation of known-item retrieval , 2003, SIGIR.