An infrastructure for open latent semantic linking

The more the web grows, the harder it is for users to find the information they need. As a result, it is even more difficult to identify when documents are related. To find out that two or more documents are in fact related, users have to navigate by the documents in carry out an analysis about their content. This paper presents an infrastructure allowing the use of latent semantic analysis and open hypermedia concepts in the automatic identification of relationships among web pages. Latent Semantic Analysis has been proposed by the information retrieval community as an attempt to organize automatically text objects into a semantic structure appropriate for matching. In open hypermedia systems, links are managed and stored in a special database, a linkbase, which allows the addition of hypermedia functionality to a document without changing the original structure and format of the document. We first present two complementary link-related efforts: an extensible latent semantic indexing service and an open linkbase service. Leveraging off those efforts, we present an infrastructure that identifying latent semantic links within web repositories and makes them available in an open linkbase. To demonstrate by example the utility of our open infrastructure, we built an application presenting a directory of semantic links extracted from web sites.

[1]  Berthier A. Ribeiro-Neto,et al.  Link-based and content-based evidential information in a belief network model , 2000, SIGIR '00.

[2]  Peter Ørbæk,et al.  Webvise: Browser and Proxy Support for Open Hypermedia Structuring Mechanisms on the World Wide Web , 1999, Comput. Networks.

[3]  Randall H. Trigg,et al.  From Web to Workplace: Designing Open Hypermedia Systems , 1999 .

[4]  Gregory D. Abowd,et al.  Linking Homogeneous Web-based Repositories , 2001, Workshop on Information Integration on the Web.

[5]  James Allan,et al.  Automatic hypertext link typing , 1996 .

[6]  Wendy Hall,et al.  Conceptual linking: ontology-based open hypermedia , 2001, WWW '01.

[7]  Gregory D. Abowd,et al.  Anchoring discussions in lecture: an approach to collaboratively extending classroom digital media , 1999, CSCL.

[8]  Steven J. DeRose,et al.  Xml pointer language (xpointer) , 1998 .

[9]  Eric Prud'hommeaux,et al.  Annotea: an open RDF infrastructure for shared Web annotations , 2002, Comput. Networks.

[10]  Stephen J. Green,et al.  Building Hypertext Links By Computing Semantic Similarity , 1999, IEEE Trans. Knowl. Data Eng..

[11]  Douglas Tudhope,et al.  Semantically indexed hypermedia: linking information disciplines , 1999, CSUR.

[12]  Gerard Salton,et al.  A blueprint for automatic indexing , 1981, SIGF.

[13]  Gregory D. Abowd,et al.  Classroom 2000: An Experiment with the Instrumentation of a Living Educational Environment , 1999, IBM Syst. J..

[14]  Hugh C. Davis,et al.  MICROCOSM: An Open Model for Hypermedia with Dynamic Linking , 1990, ECHT.

[15]  Gerard Salton,et al.  Another look at automatic text-retrieval systems , 1986, CACM.

[16]  Haym Hirsh,et al.  Using LSI for text classification in the presence of background text , 2001, CIKM '01.

[17]  Gregory D. Abowd,et al.  Linking by interacting: a paradigm for authoring hypertext , 2000, HYPERTEXT '00.

[18]  Randall H. Trigg,et al.  Toward a Dexter-based model for open hypermedia: unifying embedded references and link objects , 1996, HYPERTEXT '96.

[19]  Mayer D. Schwartz,et al.  The Dexter Hypertext Reference Model , 1994, CACM.

[20]  S. T. Dumais,et al.  Using latent semantic analysis to improve access to textual information , 1988, CHI '88.

[21]  Marja-Riitta Koivunen,et al.  Annotea: an open RDF infrastructure for shared Web annotations , 2001, WWW '01.

[22]  James Allan,et al.  Selective text utilization and text traversal , 1993, Int. J. Hum. Comput. Stud..

[23]  Steven J. DeRose,et al.  XML Pointer Language (XPointer) Version 1. 0. World Wide Web Consortium, Working Draft WD - xptr - 2 , 2001 .

[24]  Bill N. Schilit,et al.  Linking by inking: trailblazing in a paper-like hypertext , 1998, HYPERTEXT '98.

[25]  Gregory D. Abowd,et al.  Supporting long-term educational activities through dynamic web interfaces , 2001 .

[26]  Les Carr,et al.  The Distributed Link Service: A Tool for Publishers, Authors, and Readers , 1995, WWW.

[27]  Maria da Graça Campos Pimentel,et al.  Latent semantic linking over homogeneous repositories , 2001, DocEng '01.

[28]  Les Carr,et al.  Linking in context , 2001, J. Digit. Inf..

[29]  Rodolfo Soto,et al.  Learning and performing by exploration: label quality measured by latent semantic analysis , 1999, CHI '99.

[30]  Lennart Björneborn Small-world linkage and co-linkage , 2001, HYPERTEXT '01.

[31]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[32]  Mark Guzdial Supporting Learners as users , 1999, ASTR.

[33]  Gene Golovchinsky,et al.  What the query told the link: the integration of hypertext and information retrieval , 1997, HYPERTEXT '97.

[34]  Peter Ørbæk,et al.  Webvise: Browser and Proxy Support for Open Hypermedia Structuring Mechanisms on the WWW , 1998 .

[35]  Katy Börner,et al.  Extracting and visualizing semantic structures in retrieval results for browsing , 2000, DL '00.

[36]  Alan F. Smeaton,et al.  Automatic link generation , 1999, CSUR.