Hybrid Architecture for Web Search Systems Based on Hierarchical Taxonomies

Search systems based on hierarchical taxonomies provide a specific type of search functionality that is not provided by conventional search engines. For instance, using a taxonomy, the user can look for documents related to just one of the categories of the taxonomy. This paper describes a hybrid data architecture that improves the performance of restricted searches for a few categories of a taxonomy. The proposed architecture is based on a hybrid data structure composed of an inverted file with multiple integrated signature files. A detailed analysis of superimposing codes on directed acyclic graphs proves that they adapt perfectly well to a search system based on a hierarchical ontology. Two variants are presented: the hybrid architecture with complete information and the hybrid architecture with partial information. The validity of this hybrid architecture was analyzed by developing and comparing it with a basic architecture. The performance of restricted queries is clearly improved, especially with the hybrid architecture with partial information. This variant outperformed by 50 % the basic architecture for all workload environments, with a slight reduction in performance for the lower levels of the graph.

[1]  Gaston H. Gonnet,et al.  Unstructured data bases or very efficient text searching , 1983, PODS.

[2]  Philipp von Weitershausen Indexing and Searching , 2007 .

[3]  Yannis Manolopoulos,et al.  S-Index: a Hybrid Structure for Text Retrieval , 1997, ADBIS.

[4]  Koichi Takeda,et al.  Information retrieval on the web , 2000, CSUR.

[5]  Sriram Raghavan,et al.  Searching the Web , 2001, ACM Trans. Internet Techn..

[6]  Christos Faloutsos,et al.  Description and performance analysis of signature file methods for office filing , 1987, TOIS.

[7]  Victor Carneiro,et al.  Optimization of Restricted Searches in Web Directories Using Hybrid Data Structures , 2003, ECIR.

[8]  Wai Lam,et al.  Automatic Text Categorization and Its Application to Text Retrieval , 1999, IEEE Trans. Knowl. Data Eng..

[9]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[10]  Anthony Scime WebSifter: an ontology-based personalizable search agent for the Web , 2000, Proceedings 2000 Kyoto International Conference on Digital Libraries: Research and Practice.

[11]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[12]  Chin-Wan Chung,et al.  A New Indexing Scheme for Content-Based Image Retrieval , 1998, Multimedia Tools and Applications.

[13]  Jan O. Pedersen,et al.  Optimization for dynamic inverted index maintenance , 1989, SIGIR '90.

[14]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[15]  Ángel Viña,et al.  Superimposing codes representing hierarchical information in web directories , 2001, WIDM '01.

[16]  Balachander Krishnamurthy,et al.  Focusing search in hierarchical structures with directory sets , 1998, CIKM '98.

[17]  Edward A. Fox,et al.  Inverted Files , 1992, Information Retrieval: Data Structures & Algorithms.

[18]  Simon Stiassny Mathematical analysis of various superimposed coding methods , 1960 .

[19]  William P. Birmingham,et al.  Improving category specific Web search by learning query modifications , 2001, Proceedings 2001 Symposium on Applications and the Internet.

[20]  Kotagiri Ramamohanarao,et al.  Guidelines for presentation and comparison of indexing techniques , 1996, SGMD.

[21]  Kotagiri Ramamohanarao,et al.  Inverted files versus signature files for text indexing , 1998, TODS.

[22]  C.S. Roberts,et al.  Partial-match retrieval via the method of superimposed codes , 1979, Proceedings of the IEEE.

[23]  Ángel Viña,et al.  Experiences retrieving information in the world wide web , 2001, Proceedings. Sixth IEEE Symposium on Computers and Communications.

[24]  Christos Faloutsos,et al.  Hybrid Index Organizations for Text Databases , 1992, EDBT.

[25]  Clement T. Yu,et al.  Personalized web search by mapping user queries to categories , 2002, CIKM '02.

[26]  Calvin N. Mooers,et al.  Application of random codes to the gathering of statistical information , 1948 .

[27]  Timothy W. Finin,et al.  Yahoo! as an ontology: using Yahoo! categories to describe documents , 1999, CIKM '99.