A Probabilistic Approach to Navigation in Hypertext

One of the main unsolved problems confronting Hypertext is the navigation problem, namely the problem of having to know where you are in the database graph representing the structure of a Hypertext database, and knowing how to get to some other place you are searching for in the database graph. Previously we formalised a Hypertext database in terms of a directed graph whose nodes represent pages of information. The notion of a trail, which is a path in the database graph describing some logical association amongst the pages in the trail, is central to our model. We defined a Hypertext Query Language, HQL, over Hypertext databases and showed that in general the navigation problem, i.e. the problem of finding a trail that satisfies a HQL query (technically known as the model checking problem), is NP-complete. Herein we present a preliminary investigation of using a probabilistic approach in order to enhance the efficiency of model checking. The flavour of our investigation is that if we have some additional statistical information about the Hypertext database then we can utilise such information during query processing. We present two different approaches. The first approach utilises the theory of probabilistic automata. In this approach we view a Hypertext database as a probabilistic automaton, which we call a Hypertext probabilistic automaton. In such an automaton we assume that the probability of traversing a link is determined by the usage statistics of that link. We exhibit a special case when the number of trails that satisfy a query is always finite and indicate how to give a finite approximation of answering a query in the general case. The second approach utilises the theory of random Turing machines. In this approach we view a Hypertext database as a probabilistic algorithm, realised via a Hypertext random automaton. In such an automaton we assume that out of a choice of links, traversing any one of them is equally likely. We obtain the lower bound of the probability that a random trail satisfies a query. In principle, by iterating this probabilistic algorithm, associated with the Hypertext database, the probability of finding a trail that satisfies the query can be made arbitrarily large.

[1]  Harry B. Hunt,et al.  On the Equivalence and Containment Problems for Unambiguous Regular Expressions, Regular Grammars and Finite Automata , 1985, SIAM J. Comput..

[2]  David S. Johnson,et al.  A Catalog of Complexity Classes , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[3]  Jakob Nielsen,et al.  Hypertext and hypermedia , 1990 .

[4]  E. Allen Emerson,et al.  Temporal and Modal Logic , 1991, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[5]  Brian A. Davey,et al.  An Introduction to Lattices and Order , 1989 .

[6]  J. Van Leeuwen,et al.  Handbook of theoretical computer science - Part A: Algorithms and complexity; Part B: Formal models and semantics , 1990 .

[7]  Ralph P. Grimaldi,et al.  Discrete and combinatorial mathematics , 1985 .

[8]  Mark Levene,et al.  Navigation in Hypertext Is Easy Only Sometimes , 1999, SIAM J. Comput..

[9]  Ben Shneiderman,et al.  Navigating in hyperspace: designing a structure-based toolbox , 1994, CACM.

[10]  Jeffrey D. Ullman,et al.  Principles Of Database And Knowledge-Base Systems , 1979 .

[11]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[12]  Dominique Perrin,et al.  Finite Automata , 1958, Philosophy.

[13]  Azaria Paz,et al.  Introduction to probabilistic automata (Computer science and applied mathematics) , 1971 .

[14]  Ronald Fagin,et al.  A logic for reasoning about probabilities , 1988, [1988] Proceedings. Third Annual Information Symposium on Logic in Computer Science.

[15]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[16]  R. McNaughton,et al.  Counter-Free Automata , 1971 .

[17]  Jeffrey D. Ullman,et al.  Principles of Database and Knowledge-Base Systems, Volume II , 1988, Principles of computer science series.

[18]  Steve B. Cousins,et al.  Models for Hypertext , 1992 .

[19]  Eugene S. Santos,et al.  Computability by probabilistic Turing machines , 1971 .

[20]  Saharon Shelah,et al.  Reasoning with Time and Chance , 1982, Inf. Control..

[21]  John T. Gill,et al.  Computational complexity of probabilistic Turing machines , 1974, STOC '74.

[22]  Ralph P. Grimaldi Discrete and Combinatoral Mathematics: An Applied Introduction 2nd Ed. , 1989 .

[23]  John G. Kemeny,et al.  Finite Markov Chains. , 1960 .

[24]  P. David Stotts,et al.  Petri-net-based hypertext: document structure with browsing semantics , 1989, TOIS.

[25]  Vannevar Bush,et al.  As we may think , 1945, INTR.

[26]  Mark E. Frisse,et al.  Models for Hypertext , 1992, J. Am. Soc. Inf. Sci..

[27]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[28]  Namio Honda,et al.  Fuzzy Events Realized by Finite Probabilistic Automata , 1968, Inf. Control..

[29]  Carl A. Gunter,et al.  In handbook of theoretical computer science , 1990 .

[30]  P. David Stotts,et al.  Hyperdocuments as automata: trace-based browsing property verification , 1992, ECHT '92.

[31]  Frank Harary,et al.  Distance in graphs , 1990 .