论文信息 - Data Structures for Eecient Broker Implementation

Data Structures for Eecient Broker Implementation

With the profusion of text databases on the Internet, it is becoming increasingly hard to nd the most useful databases for a given query. To attack this problem, several existing and proposed systems employ brokers to direct user queries, using a local database of summary information about the available databases. This summary information must e ectively distinguish relevant databases, and must be compact while allowing e cient access. We o er evidence that one broker, GlOSS, can be e ective at locating databases of interest even in a system of hundreds of databases, and examine the performance of accessing the GlOSS summaries for two promising storage methods: the grid le and partitioned hashing. We show that both methods can be tuned to provide good performance for a particular workload (within a broad range of workloads), and discuss the tradeo s between the two data structures. As a side e ect of our work, we show that grid les are more broadly applicable than previously thought; in particular, we show that by varying the policies used to construct the grid le we can provide good performance for a wide range of workloads even when storing highly skewed data.

Anthony Tomasic | Calvin Lue | A. Tomasic | Calvin Lue

[1] Jan Pedersen. Optimizations for Dynamic Inverted Index Maintenance Inverted Indices , 1990 .

[2] W. Bruce Croft,et al. Searching distributed collections with inference networks , 1995, SIGIR '95.

[3] Chris Clifton,et al. Information Brokers: Sharing Knowledge in a Heterogeneous Distributed System , 1993, DEXA.

[4] Jim Fullton,et al. Architecture of the Whois++ Index Service , 1996, RFC.

[5] W. Bruce Croft,et al. Fast Incremental Indexing for Full-Text Information Retrieval , 1994, VLDB.

[6] M. V. Wilkes,et al. The Art of Computer Programming, Volume 3, Sorting and Searching , 1974 .

[7] Christos Faloutsos,et al. The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[8] Kotagiri Ramamohanarao,et al. Partial-match retrieval for dynamic files , 1982, BIT.

[9] Peter B. Danzig,et al. Distributed Indexing of Autonomous Internet Services , 1992, Comput. Syst..

[10] B. Clifford Neuman,et al. The Prospero File System: A Global File System Based on the Virtual System Model , 1992, Comput. Syst..

[11] Jürg Nievergelt,et al. The Grid File: An Adaptable, Symmetric Multikey File Structure , 1984, TODS.

[12] Yelena Yesha,et al. An Information Retrieval System for Network Resources , 1993, NGITS.

[13] Hector Garcia-Molina,et al. Incremental updates of inverted lists for text document retrieval , 1994, SIGMOD '94.

[14] Brewster Kahle,et al. An information system for corporate users: wide area information servers , 1991 .

[15] Peter B. Danzig,et al. Internet resource discovery services , 1993, Computer.

[16] Luis Gravano,et al. Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies , 1995, VLDB.

[17] Sergio Pissanetzky,et al. Sparse Matrix Technology , 1984 .

[18] Gio Wiederhold. File organization for database design , 1987 .

[19] Michael Freeston. A general solution of the n-dimensional B-tree problem , 1995, SIGMOD '95.

[20] Peter B. Danzig,et al. Distributed indexing: a scalable mechanism for distributed information retrieval , 1991, SIGIR '91.

[21] Luis Gravano,et al. The Effectiveness of GlOSS for the Text Database Discovery Problem , 1994, SIGMOD Conference.

[22] Joann J. Ordille,et al. Distributed active catalogs and meta-data caching in descriptive name services , 1993, [1993] Proceedings. The 13th International Conference on Distributed Computing Systems.

[23] Peter B. Danzig,et al. Harvest: A Scalable, Customizable Discovery and Access System , 1994 .

[24] Klaus H. Hinrichs,et al. Implementation of the grid file: Design concepts and experience , 1985, BIT.

[25] Klaus H. Hinrichs,et al. A new algorithm for computing joins with grid files , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[26] Michael McGill,et al. Introduction to Modern Information Retrieval , 1983 .

[27] John W. Lloyd. Optimal partial-match retrieval , 1980, BIT Comput. Sci. Sect..

[28] B. Clifford Neuman,et al. A Comparison of Internet Resource Discovery Approaches , 1992, Comput. Syst..

[29] Mark A. Sheldon,et al. A CONTENT ROUTING SYSTEM FOR DISTRIBUTED INFORMATION SYSTEMS , 1993 .

[30] Luis Gravano,et al. Precision and recall of GlOSS estimators for database discovery , 1994, Proceedings of 3rd International Conference on Parallel and Distributed Information Systems.

[31] T. H. Merrett,et al. A class of data structures for associative searching , 1984, PODS.

[32] G. Weiderhold. File organization for database design , 1987 .

[33] Ron Sacks-Davis,et al. An e cient indexing technique for full-text database systems , 1992, VLDB 1992.

[34] Marios Hadjieleftheriou,et al. R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[35] Jeffrey D. Ullman,et al. Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[36] Michael F. Schwartz,et al. A Scalable, Non-Hierarchical Resource Discovery Mechanism Based on Probabilistic Protocols† , 1990 .

[37] Andrzej Duda,et al. Content routing in a network of WAIS servers , 1994, 14th International Conference on Distributed Computing Systems.

[38] Christos Faloutsos,et al. Multiattribute hashing using Gray codes , 1986, SIGMOD '86.

[39] Alfred V. Aho,et al. Optimal partial-match retrieval when fields are independently specified , 1979, ACM Trans. Database Syst..

[40] Tim Berners-Lee,et al. World-Wide Web: The Information Universe , 1992, Electron. Netw. Res. Appl. Policy.