Can the Web turn into a digital library?

There is no doubt that the enormous amounts of information on the WWW are influencing how we work, live, learn and think. However, information on the WWW is in general too chaotic, not reliable enough and specific material often too difficult to locate that it cannot be considered a serious digital library. In this paper we concentrate on the question how we can retrieve reliable information from the Web, a task that is fraught with problems, but essential if the WWW is supposed to be used as serious digital library. It turns out that the use of search engines has many dangers. We will point out some of the possible ways how those dangers can be reduced and how dangerous traps can be avoided. Another approach to find useful information on the Web is to use “classical” resources of information like specialized dictionaries, lexica or encyclopaedias in electronic form, such as the Britannica. Although it seemed for a while that such resources might more or less disappear from the Web due to attempts such as Wikipedia, some to the classical encyclopaedias and specialized offerings have picked up steam again and should not be ignored. They do sometimes suffer from what we will call the “wishy-washy” syndrome explained in this paper. It is interesting to note that Wikipedia which is also larger than all other encyclopaedias (at least the English version) is less afflicted by this syndrome, yet has some other serious drawbacks. We discuss how those could be avoided and present a system that is halfway between prototype and production system that does take care of many of the aforementioned problems and hence may be a model for further undertakings in turning (part of) the Web into a useable digital library.

[1]  Allen R. Hanson,et al.  Scene Text Recognition Using Similarity and a Lexicon with Sparse Belief Propagation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Benno Stein,et al.  Plagiarism Detection Without Reference Collections , 2006, GfKl.

[3]  Maurizio Vichi,et al.  Studies in Classification Data Analysis and knowledge Organization , 2011 .

[4]  Hermann A. Maurer,et al.  Addressing plagiarism and IPR violation , 2007, Inf. Serv. Use.

[5]  Hermann A. Maurer,et al.  Plagiarism - A Survey , 2006, J. Univers. Comput. Sci..

[6]  Wolf-Tilo Balke,et al.  Topic-Centered Aggregation of Presentations for Learning Object Repurposing , 2008 .

[7]  Hermann A. Maurer,et al.  Fighting plagiarism and IPR violation: why is it so important? , 2007, Learn. Publ..

[8]  Benno Stein,et al.  Intrinsic Plagiarism Detection , 2006, ECIR.

[9]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[10]  Gerhard Wurzinger Information Consolidation in Large Bodies of Information , 2010, J. Univers. Comput. Sci..

[11]  I. C. Meijer The wisdom of crowds , 2010 .

[12]  Benno Stein,et al.  Towards automatic quality assurance in Wikipedia , 2011, WWW.

[13]  P. Korica-Pehserl,et al.  Semi-automatic information retrieval and consolidation with a sample application , 2012, 2012 International Conference on Emerging Technologies.

[14]  Les Gasser,et al.  Information quality work organization in wikipedia , 2008, J. Assoc. Inf. Sci. Technol..