Generating an XML-Based Search Index for an Effective Search of Office Documents

Office applications are becoming a major pillar of today‟s organizations since they are used to edit a vast amount of digital documents. Finding these office documents in large databases that fit users‟ needs is becoming increasingly important. Broad one- or two-word searches in conventional search engines are often plagued by low precision, returning many irrelevant documents as their output. In order to solve this problem, we propose a technique that allows users to define search terms inside office documents and term descriptions presenting semantic relationships between the office documents and their search terms. This technique provides a means that generates an XML-based search index allowing users to effectively find out target office documents by search conditions, whose definitions are based on document types, search terms, and term descriptions. We also present the schema of the proposed search index that allows users to effectively search office documents of various types, and also present a search framework that uses the proposed technique.

[1]  Nigel Shadbolt,et al.  Resource Description Framework (RDF) , 2009 .

[2]  F. E. A Relational Model of Data Large Shared Data Banks , 2000 .

[3]  Charlotte Jenkins,et al.  Server-side automatic metadata generation using qualified Dublin Core and RDF , 2000, Proceedings 2000 Kyoto International Conference on Digital Libraries: Research and Practice.

[4]  Eric van der Vlist,et al.  XML Schema , 2002 .

[5]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[6]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[7]  Gottfried Vossen,et al.  Design and Implementation of a Novel Approach to Keyword Searching in Relational Databases , 2000, ADBIS-DASFAA.

[8]  Gz,et al.  命运多舛——MicroSoft Office传奇 , 2006 .

[9]  Vagelis Hristidis,et al.  DISCOVER: Keyword Search in Relational Databases , 2002, VLDB.

[10]  Gary B. Shelly,et al.  Microsoft Office , 1995 .

[11]  Gottfried Vossen,et al.  SISQL: schema-independent database querying (on and off the Web) , 2000, Proceedings 2000 International Database Engineering and Applications Symposium (Cat. No.PR00789).

[12]  S. Sudarshan,et al.  Keyword searching and browsing in databases using BANKS , 2002, Proceedings 18th International Conference on Data Engineering.

[13]  C. M. Sperberg-McQueen,et al.  eXtensible Markup Language (XML) 1.0 (Second Edition) , 2000 .

[14]  Eugene Agichtein,et al.  Predicting accuracy of extracting information from unstructured text collections , 2005, CIKM '05.

[15]  Jim Melton,et al.  XML schema , 2003, SGMD.

[16]  Surajit Chaudhuri,et al.  DBXplorer: a system for keyword-based search over relational databases , 2002, Proceedings 18th International Conference on Data Engineering.

[17]  E. F. CODD,et al.  A relational model of data for large shared data banks , 1970, CACM.

[18]  David Gelernter,et al.  Lifestreams: a storage model for personal data , 1996, SGMD.

[19]  Chih-Ping Wei,et al.  Managing document categories in e-commerce environments: an evolution-based approach , 2002, Eur. J. Inf. Syst..