论文信息 - Distributed processing of queries for XML documents in an agent based information retrieval system

Distributed processing of queries for XML documents in an agent based information retrieval system

The paper addresses the problem of efficiently querying large numbers of text documents using parallel processing methods. The optimization criteria are somewhat different from those used in querying heterogeneous databases, largely because the extraction of ontological information from documents is the dominant component of query execution time. We assume that each document has been previously annotated using XML. The authors describe the architecture of a system to process ontology based queries for XML annotated documents. We have introduced two basic strategies for query processing: simple strategy, and semi-join strategy, and their possible extensions using pipelining and longer lists for keyword search. Different levels of parallelism for these strategies are discussed. An evaluation model is created and used to derive optimal replication of resource agents. The theoretical and experimental results are compared.

Marek Rusinkiewicz | Bogdan D. Czejdo | Malcolm C. Taylor | Ruth Miller

[1] Vipul Kashyap,et al. InfoSleuth: agent-based semantic integration of information in open and dynamic environments , 1997, SIGMOD '97.

[2] Laura M. Haas,et al. Cost Models DO Matter: Providing Cost Information for Diverse Data Sources in a Federated System , 1999, VLDB.

[3] Marian H. Nodine,et al. Active Information Gathering in InfoSleuth , 1999, CODAS.

[4] Bogdan D. Czejdo,et al. Using a Semantic Model and XML for Document Annotation , 2000, IEA/AIE.

[5] Daniel P. Miranker,et al. Processing queries for first-few answers , 1996, CIKM '96.

[6] Alon Y. Halevy,et al. An adaptive query execution system for data integration , 1999, SIGMOD '99.

[7] Marek Rusinkiewicz,et al. Automatic generation of ontology based annotations in XML and their use in retrieval systems , 2000, Proceedings of the First International Conference on Web Information Systems Engineering.

[8] Yannis Papakonstantinou,et al. Object Fusion in Mediator Systems , 1996, VLDB.

[9] Alin Deutsch,et al. A Query Language for XML , 1999, Comput. Networks.

[10] Laura M. Haas,et al. Optimizing Queries Across Diverse Data Sources , 1997, VLDB.

[11] Laurent Amsaleg,et al. Cost-based query scrambling for initial delays , 1998, SIGMOD '98.

[12] Lynette Hirschman,et al. Mixed-Initiative Development of Language Processing Systems , 1997, ANLP.

[13] Patrick Valduriez,et al. Scaling Access to Heterogeneous Data Sources with DISCO , 1998, IEEE Trans. Knowl. Data Eng..