Discovering missing links in Wikipedia

In this paper we address the problem of discovering missing hypertext links in Wikipedia. The method we propose consists of two steps: first, we compute a cluster of highly similar pages around a given page, and then we identify candidate links from those similar pages that might be missing on the given page. The main innovation is in the algorithm that we use for identifying similar pages, LTRank, which ranks pages using co-citation and page title information. Both LTRank and the link discovery method are manually evaluated and show acceptable results, especially given the simplicity of the methods and conservativeness of the evaluation criteria.

[1]  Yiming Yang,et al.  Hypertext Categorization using Hyperlink Patterns and Meta Data , 2001, ICML.

[2]  Murray Turoff,et al.  Hypertext functionality: A theoretical framework , 1990, Int. J. Hum. Comput. Interact..

[3]  Acknowledgments , 2006, Molecular and Cellular Endocrinology.

[4]  Monika Henzinger,et al.  Finding Related Pages in the World Wide Web , 1999, Comput. Networks.

[5]  Nora Miller Wikipedia and the Disappearing "Author" , 2005 .

[6]  Andrew Lih,et al.  Wikipedia as Participatory Journalism: Reliable Sources? Metrics for evaluating collaborative media as a news resource , 2004 .

[7]  Martin Wattenberg,et al.  Studying cooperation and conflict between authors with history flow visualizations , 2004, CHI.

[8]  Ravi Kumar,et al.  Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[9]  Gilad Mishne,et al.  Using Wikipedia at the TREC QA Track , 2004, TREC.

[10]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[11]  Andrea Ciffolilli,et al.  Phantom authority, self-selective recruitment and retention of members in virtual communities: The case of Wikipedia , 2003, First Monday.

[12]  Dennis N. Ocholla,et al.  Proceedings of ISSI 2007 - 11th International Conference of the International Society for Scientometrics and Informetrics , 2005 .

[13]  David Ellis,et al.  On the measurement of inter-linker consistency and retrieval effectiveness in hypertext databases , 1994, SIGIR '94.

[14]  J. Voß Measuring Wikipedia , 2005 .