BT+-tree: A New Index for Temporal Information in Web Pages

With the growth of Web information, traditional search engines, which are built on the text-based search technology, are unable to meet users’ demands on Web search. As many queries are time-related, and most Web pages contain time information, it has been an important issue to develop time-aware Web search engines. Based on this view, in this paper we study the indexing mechanism of the temporal information in Web pages. Our work is based on the assumption that each Web page only has one primary time, which will be utilized in time-based Web search. We present a new index structure called BT+-tree which is based on the MAP21-tree. However, unlike MAP21-tree’s double-tree structure, BT+-tree only uses one tree structure. Furthermore, duplicated keys can be effectively treated in BT+-tree, while the MAP21-tree has little consideration on duplicated keys. After discussing the index structure as well as manipulation algorithms of BT+-tree, we design a testing program to measure the performance of BT+-tree. The experimental results show that BT+-tree is effective for indexing temporal information in Web pages.

[1]  Christian S. Jensen,et al.  On the Semantics of Now in Temporal Databases , 1994 .

[2]  Michael Gertz,et al.  On the value of temporal information in information retrieval , 2007, SIGF.

[3]  Terence John Parr,et al.  Chronica: a temporal web search engine , 2006, ICWE '06.

[4]  Cristina Ribeiro,et al.  Use of Temporal Expressions in Web Search , 2008, ECIR.

[5]  Mario A. Nascimento,et al.  Indexing Valid Time Databases via B+-Trees , 1999, IEEE Trans. Knowl. Data Eng..

[6]  llsoo Ahn,et al.  Temporal Databases , 1986, Computer.

[7]  Christian S. Jensen,et al.  On the Semantics of , 1996 .

[8]  Robert H. Halstead,et al.  Parallel Symbolic Computing , 1986, Computer.

[9]  Christian S. Jensen,et al.  Light-weight indexing of general bitemporal data , 2000, Proceedings. 12th International Conference on Scientific and Statistica Database Management.

[10]  Peter Ingwersen,et al.  Developing a Test Collection for the Evaluation of Integrated Search , 2010, ECIR.

[11]  Christian S. Jensen,et al.  R-Tree Based Indexing of Now-Relative Bitemporal Data , 1998, VLDB.

[12]  Chuan-Heng Ang,et al.  The Interval B-Tree , 1995, Inf. Process. Lett..

[13]  Beng Chin Ooi,et al.  Indexing Temporal Data Using Existing B+-Trees , 1996, Data Knowl. Eng..