HDDI™: Hierarchical Distributed Dynamic Indexing

The explosive growth of digital repositories of information has been enabled by recent developments in communication and information technologies. The global Internet/World Wide Web exemplifies the rapid deployment of such technologies. Despite significant accomplishments in internetworking, however, scalable indexing and data-mining techniques for computational knowledge management lag behind the rapid growth of distributed collections. Hierarchical Distributed Dynamic Indexing (HDDI™) is an approach that dynamically creates a hierarchical index from distributed document collections. At each node of the hierarchical index, a knowledge base is created and subtopic regions of semantic locality in conceptual space are identified. This chapter introduces HDDI™, focusing on the model building techniques employed at each node of the hierarchy. A novel approach to information clustering based on the contextual transitivity of similarity between terms is introduced. We conclude with several example applications of HDDI™ in the textual data mining and information retrieval fields.

[1]  William M. Pottenger,et al.  The role of associativity and commutativity in the detection and transformation of loop-level parallelism , 1998, ICS '98.

[2]  Gerald Salton,et al.  Automatic text processing , 1988 .

[3]  Gerard Salton,et al.  Dynamic information and library processing , 1975 .

[4]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[5]  Iain D. Craig,et al.  Computing: the Future a Broader Agenda for Computer Science and Engineering, edited by Juris Hartmanis and Hebert Lin, National Academy Press, Washington DC, USA, 1992, 272 pp., Appendix & Index (Pbk: $24.95) , 1995, Robotica.

[6]  Hsinchun Chen,et al.  A Concept Space Approach to Addressing the Vocabulary Problem in Scientific Information Retrieval: An Experiment on the Worm Community System , 1997, J. Am. Soc. Inf. Sci..

[7]  Robert E. Tarjan,et al.  Depth-First Search and Linear Graph Algorithms , 1972, SIAM J. Comput..

[8]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[9]  C. Lee Giles,et al.  Accessibility of information on the web , 1999, Nature.

[10]  F. D. Bouskila The Role of Semantic Locality in Hierarchical Distributed Dynamic Indexing and Information Retrieval , 1999 .

[11]  Juris Hartmanis,et al.  Computing the Future: A Broader Agenda for Computer Science and Engineering , 1992 .

[12]  William M. Pottenger Theory, techniques, and experiments in solving recurrences in computer programs , 1997 .

[13]  Lauri Karttunen Directed Replacement , 1996, ACL.

[14]  Ewan Klein,et al.  Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics , 2000, ACL 2000.

[15]  Alfred V. Aho,et al.  The Design and Analysis of Computer Algorithms , 1974 .

[16]  K. J. Lynch,et al.  Automatic construction of networks of concepts characterizing document databases , 1992, IEEE Trans. Syst. Man Cybern..

[17]  Karen Sparck Jones Automatic keyword classification for information retrieval , 1971 .

[18]  Glenn D. Blank,et al.  CIMEL: constructive, collaborative inquiry-based multimedia E-learning. , 2001 .

[19]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.