A look at some issues during textual linking of homogeneous web repositories

Interacting with services that create links automatically via Web users are able to identify relationships among documents stored in different repositories. The fact that automatic linking services do not use queries performed by a human user has impact in the use of information retrieval techniques for the identification of relationships. Information retrieval techniques can lead to the identification of relationships that should not have been generated (generating non-relevant links) at the same time that fail to identify all relevant relationships (poor recall). Towards improving the quality of the relationships identified we have investigated some design issues considered during the automatic linking of textual repositories. The investigations have used a collection of documents from online Brazilian Newspapers and the Cystic Fibrosis Collection. The results of the investigations have defined procedures infrastructures and consequently the requirements for a configurable linking service made also available as a contribution of this work.

[1]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[2]  Renata Pontin de Mattos Fortes,et al.  An open linking service supporting the authoring of web documents , 2002, DocEng '02.

[3]  Helen R. Tibbo,et al.  The Cystic Fibrosis Database: Content and Research Opportunities. , 1991 .

[4]  Rob Koper,et al.  An infrastructure for open latent semantic linking , 2002, HYPERTEXT '02.

[5]  Gene Golovchinsky,et al.  What the query told the link: the integration of hypertext and information retrieval , 1997, HYPERTEXT '97.

[6]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[7]  M. Stephens EDF Statistics for Goodness of Fit and Some Comparisons , 1974 .

[8]  Gerard Salton,et al.  Another look at automatic text-retrieval systems , 1986, CACM.

[9]  Maria da Graça Campos Pimentel,et al.  Automatically sharing web experiences through a hyperdocument recommender system , 2003, HYPERTEXT '03.

[10]  James Allan,et al.  Selective text utilization and text traversal , 1993, Int. J. Hum. Comput. Stud..

[11]  James Blustein Automatically generated hypertext versions of scholarly articles and their evaluation , 2000, HYPERTEXT '00.

[12]  Gregory D. Abowd,et al.  Linking Homogeneous Web-based Repositories , 2001, Workshop on Information Integration on the Web.

[13]  Stephen J. Green,et al.  Building Hypertext Links By Computing Semantic Similarity , 1999, IEEE Trans. Knowl. Data Eng..

[14]  Maria da Graça Campos Pimentel,et al.  Latent semantic linking over homogeneous repositories , 2001, DocEng '01.