Generating a dynamic hypertext environment with n-gram analysis

Methods and tools for finding documents relevant to a user’s needs in a very large corpus of documents can be found in the information retrieval, library science, and hypertext communities. Typically, these systems provide retrieval capabilities for fairly static corpora, their algorithms are language dependent, and they don’t perform well when presented with misspelled words or text that has been degraded by optical character recognition techniques. In addition, the capabilities needed by a variety of researchers, analysts, and investigative reporters from these methods and tools go beyond the traditional indexing systems. The ability to find related documents, to query using natural language, and to filter out desired information suggest a more interactive and robust user interface. We describe our work in progress for generating a dynamic hypertext environment that provides the capabilities of relating, querying, and iiltering in a largescale, dynamic environment that is robust in the sense that it is language independent and capable of dealing with misspelled words.

[1]  Theodor Holm Nelson Managing immense storage , 1988 .

[2]  Douglas C. Engelbart,et al.  A research center for augmenting human intellect , 1968, AFIPS Fall Joint Computing Conference.

[3]  Elena M. Zamora,et al.  The use of trigram analysis for spelling error detection , 1981, Inf. Process. Manag..

[4]  Catriel Beeri,et al.  A Logical Query Language for Hypertext Systems , 1992, ECHT.

[5]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[6]  Peter J. Brown,et al.  Turning ideas into products: the Guide system , 1987, Hypertext.

[7]  Steven M. Drucker,et al.  Intermedia: the concept and the construction of a seamless information environment , 1988, Computer.

[8]  Peter Willett Document Retrieval Experiments using Indexing Vocabularies of varying Size. Ii. Hashing, truncation, digram and Trigram Encoding of Index Terms , 1979, J. Documentation.

[9]  Robert M. Akscyn,et al.  KIVIS: A DISTRIBUTED HYPERMEDIA SYSTEM FOR MANAGING KNOWLEDGE IN ORGANIZATIONS Developers of hypermedia systems face many design issues. The design for KMS, a large-scale hypermedia system for collaborative work, seeks improzjed user productivity through simplicity of the conceptual data model. , 1988 .

[10]  Raymond J. D'Amore,et al.  One-time complete indexing of text: theory and practice , 1985, SIGIR '85.

[11]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[12]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .