Low-cost semantic enhancement to digital library metadata and indexing: Simple yet effective strategies

Most existing digital libraries use traditional lexically-based retrieval techniques. For established systems, completely replacing, or even making significant changes to the document retrieval mechanism (document analysis, indexing strategy, query processing and query interface) would require major technological effort, and would most likely be disruptive. In this paper, we describe ways to use the results of semantic analysis and disambiguation, while retaining an existing keyword-based search and lexicographic index. We engineer this so the output of semantic analysis (performed off-line) is suitable for import directly into existing digital library metadata and index structures, and thus incorporated without the need for architecture modifications.

[1]  Slav Petrov,et al.  Syntactic Annotations for the Google Books NGram Corpus , 2012, ACL.

[2]  Takahiro Hara,et al.  A Thesaurus Construction Method from Large ScaleWeb Dictionaries , 2007, 21st International Conference on Advanced Information Networking and Applications (AINA '07).

[3]  J. Stephen Downie,et al.  Improving Access to Large-scale Digital Libraries ThroughSemantic-enhanced Search and Disambiguation , 2015, JCDL.

[4]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[5]  Ian H. Witten,et al.  Greenstone: a comprehensive open-source digital library software system , 2000, DL '00.

[6]  M. Day Home rule. , 1995, Nursing times.

[7]  Peter Leonard Mining large datasets for the humanities , 2014 .

[8]  W. R. Sykes Contributions to the Flora of Niue. , 1970 .

[9]  Annika Hinze,et al.  Semantic bookworm: Mining literary resources revisited , 2016, 2016 IEEE/ACM Joint Conference on Digital Libraries (JCDL).

[10]  Ian H. Witten,et al.  Mining Domain-Specific Thesauri from Wikipedia: A Case Study , 2006, 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06).

[11]  Ian H. Witten,et al.  A knowledge-based search engine powered by wikipedia , 2007, CIKM '07.

[12]  Ian H. Witten,et al.  Clustering Documents Using a Wikipedia-Based Concept Representation , 2009, PAKDD.

[13]  Rada Mihalcea,et al.  Wikify!: linking documents to encyclopedic knowledge , 2007, CIKM '07.

[14]  Ian H. Witten,et al.  Learning to link with wikipedia , 2008, CIKM '08.