Automatic generation of inter-passage links based on semantic similarity

This paper investigates the use and the prediction potential of semantic similarity measures for automatic generation of links across different documents and passages. First, the correlation between the way people link content and the results produced by standard semantic similarity measures is investigated. The relation between semantic similarity and the length of the documents is then also analysed. Based on these findings a new method for link generation is formulated and tested.

[1]  Stephen J. Green,et al.  Automated Link Generation: Can we do Better than Term Repetition? , 1998, Comput. Networks.

[2]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[3]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[4]  Bill N. Schilit,et al.  Generating links by mining quotations , 2008, Hypertext.

[5]  Dominic Widdows,et al.  Semantic Vectors: a Scalable Open Source Package and Online Technology Management Application , 2008, LREC.

[6]  Jihong Zeng,et al.  From keywords to links: an automatic approach , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  James Allan Building Hypertext Using Information Retrieval , 1997, Inf. Process. Manag..

[9]  Shlomo Geva GPX: Ad-Hoc Queries and Automated Link Discovery in the Wikipedia , 2007, INEX.

[10]  Jaap Kamps,et al.  A Content-Based Link Detection Approach Using the Vector Space Model , 2008, INEX.

[11]  Stephen J. Green,et al.  Building Hypertext Links By Computing Semantic Similarity , 1999, IEEE Trans. Knowl. Data Eng..

[12]  Andreas Heß,et al.  Stealing Anchors to Link the Wiki , 2008, INEX.

[13]  Alan F. Smeaton,et al.  Automatic link generation , 1999, CSUR.

[14]  Jiyin He,et al.  Link Detection with Wikipedia , 2009, INEX.

[15]  Wei Lu,et al.  CSIR at INEX 2008 Link-the-Wiki Track , 2008, INEX.

[16]  Michael Granitzer,et al.  Context Based Wikipedia Linking , 2008, INEX.

[17]  W. Che,et al.  Experiments and Evaluation of Link Discovery in the Wikipedia , 2008 .

[18]  Charles L. A. Clarke,et al.  University of Waterloo at INEX 2008: Adhoc, Book, and Link-the-Wiki Tracks , 2008, INitiative for the Evaluation of XML Retrieval.

[19]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[20]  Andrew Trotman,et al.  Wikisearching and Wikilinking , 2008, INEX.

[21]  Thorsten Brants,et al.  Multiple Similarity Measures and Source-Pair Information in Story Link Detection , 2004, HLT-NAACL.

[22]  Andrew Trotman,et al.  Advances in Focused Retrieval, 7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008, Dagstuhl Castle, Germany, December 15-18, 2008. Revised and Selected Papers , 2009, INEX.

[23]  Andrew Trotman,et al.  Experiments and evaluation of link discovery in the Wikipedia , 2008 .