BitCube: A Three-Dimensional Bitmap Indexing for XML Documents

We describe a new bitmap indexing based technique to cluster XML documents. XML is a new standard for exchanging and representing information on the Internet. Documents can be hierarchically represented by XML-elements. XML documents are represented and indexed using a bitmap indexing technique. We define the similarity and popularity operations available in bitmap indexes and propose a method for partitioning a XML document set. Furthermore, a 2-dimensional bitmap index is extended to a 3-dimensional bitmap index, called BitCube. We define statistical measurements in the BitCube: mean, mode, standard derivation, and correlation coefficient. Based on these measurements, we also define the slice, project, and dice operations on a BitCube. BitCube can be manipulated efficiently and improves the performance of document retrieval.

[1]  Patrick E. O'Neil,et al.  Improved query performance with variant indexes , 1997, SIGMOD '97.

[2]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[3]  Hector Garcia-Molina,et al.  Incremental updates of inverted lists for text document retrieval , 1994, SIGMOD '94.

[4]  Peter Willett,et al.  Recent trends in hierarchic document clustering: A critical review , 1988, Inf. Process. Manag..

[5]  Yannis E. Ioannidis,et al.  Bitmap index design and evaluation , 1998, SIGMOD '98.

[6]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[7]  Sung-Hyuk Kim,et al.  A three-level user interface to multimedia digital libraries with relaxation and restriction , 1998, Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-.

[8]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[9]  Laurence B. Heilprin Mechanized information storage, retrieval and dissemination: Proceedings of the FID-IFIP Conference on Mechanized Information Storage, Retrieval and Dissemination, Rome, June 14–17, 1967. (Edited by Kjell Samuelson.) North-Holland, 1968. 729 pp. plus xiv. , 1970 .

[10]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[11]  Koichi Takeda,et al.  Information retrieval on the web , 2000, CSUR.

[12]  Ming-Chuan Wu,et al.  Query optimization for selections using bitmaps , 1999, SIGMOD '99.