Opportunities and challenges in enhancing access to metadata of cultural heritage collections: a survey

Machine processable data that narrate digital/non-digital resources are termed as metadata. Different metadata standards exist for describing various types of digital objects. Several researches have reported on how to address issues related to accessing of metadata resources. Most studies on metadata involve cultural heritage domain, and this is an indication of the importance of this domain in metadata research and development. Research on metadata in cultural heritage mainly revolves around three fundamental issues: (1) lack of quality in metadata contents in most of the cases, (2) difficulty in accessing metadata contents due largely to limited user’s knowledge on the content of the metadata, and (3) heterogeneity of the data at the level of schemas which makes the access even more difficult. The lack of quality in metadata makes it difficult for the users to retrieve and explore information that satisfies their needs. So, in order to make its contents more accessible, enhancing the metadata content is required, especially for cultural heritage collections which consist of digital objects (structured documents) described by a variety of metadata schemas. This paper presents issues and challenges in enhancing access to metadata by reviewing the existing approaches in metadata environment with a particular emphasis on cultural heritage collections. In this paper, firstly, we look at the classification of metadata which is divided into two categories namely data retrieval and information retrieval. Then, we present the analysis, findings and suggestions on how to address issues in enhancing access to metadata contents especially in cultural heritage collections. A detailed comparison is given between information retrieval and data retrieval, and it focuses on the applicability of one approach over the other. A framework that aims to improve the effectiveness of retrieval when searching metadata is also proposed and tested. The proposed framework consists of approaches and methods that are expected to enhance access to metadata especially in cultural heritage collections and be useful for those with limited knowledge on cultural heritage. The experiments were conducted on CHiC2013 which is a collection on cultural heritage. The results show a considerable enhancement over other IR approaches that use the expansion methods.

[1]  Gareth J. F. Jones,et al.  Classifying and filtering blind feedback terms to improve information retrieval effectiveness , 2010, RIAO.

[2]  Antoine Isaac,et al.  Exploring Comparative Evaluation of Semantic Enrichment Tools for Cultural Heritage Metadata , 2016, TPDL.

[3]  Stefano Mizzaro,et al.  Short text categorization exploiting contextual enrichment and external knowledge , 2014, SoMeRA@SIGIR.

[4]  Luepol Pipanmaekaporn,et al.  Latent Space Learning for Enhanced Short Text Classification , 2016, ICCIS '16.

[5]  Pawan Sharma,et al.  Finding Similar Patents through Semantic Query Expansion , 2015 .

[6]  Arantxa Otegi,et al.  Document Expansion Based on WordNet for Robust IR , 2010, COLING.

[7]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[8]  K. Shadan,et al.  Available online: , 2012 .

[9]  Abdeltawab M. Hendawi,et al.  A proposed model for data warehouse ETL processes , 2011, J. King Saud Univ. Comput. Inf. Sci..

[10]  LANGUAGE MODEL FOR DIGITAL RECOURSE OBJECTS RETRIEVAL , 2019 .

[11]  Martin Doerr,et al.  A New Framework For Querying Semantic Networks , 2011 .

[12]  Erhard Rahm,et al.  Generic schema matching, ten years later , 2011, Proc. VLDB Endow..

[13]  Jun Adachi,et al.  Cultural Heritage Online : Information Access across Heterogeneous Cultural Heritage in Japan , 2004 .

[14]  A. R. Rivas,et al.  Study of Query Expansion Techniques and Their Application in the Biomedical Information Retrieval , 2014, TheScientificWorldJournal.

[15]  Feng Shao,et al.  XRANK: ranked keyword search over XML documents , 2003, SIGMOD '03.

[16]  Mohamed Abid,et al.  Experiments on Element and Document Statistics for XML Retrieval based on tree matching , 2008 .

[17]  Mounia Lalmas,et al.  Best entry points for structured document retrieval - Part II: Types, usage and effectiveness , 2006, Inf. Process. Manag..

[18]  Ashish V. Tendulkar,et al.  Comparative study of clustering techniques for short text documents , 2011, WWW.

[19]  G Stix,et al.  The mice that warred. , 2001, Scientific American.

[20]  W. Bruce Croft,et al.  Improving the effectiveness of information retrieval with local context analysis , 2000, TOIS.

[21]  Wafa’ Za’al Alma’aitah,et al.  Information Retrieval Framework for Digital Resource Objects , 2019, International Journal of Advanced Trends in Computer Science and Engineering.

[22]  Werner Bailer,et al.  Ubiquitous Access to Digital Cultural Heritage , 2017, Journal on Computing and Cultural Heritage.

[23]  Robert Wilensky,et al.  A framework for distributed digital object services , 2006, International Journal on Digital Libraries.

[24]  Maarten Marx,et al.  Linking the kingdom: enriched access to a historiographical text , 2013, K-CAP.

[25]  Yan Huang,et al.  Vocabulary and language model adaptation using information retrieval , 2004, INTERSPEECH.

[26]  Murtha Baca Practical Issues in Applying Metadata Schemas and Controlled Vocabularies to Cultural Heritage Information , 2003 .

[27]  Michael Granitzer,et al.  Web-based Just-In-Time Retrieval for Cultural Content , 2014 .

[28]  Kim Tallerås,et al.  Evaluating (Linked) Metadata Transformations Across Cultural Heritage Domains , 2014, MTSR.

[29]  Eero Hyvönen,et al.  WarSampo Data Service and Semantic Portal for Publishing Linked Open Data About the Second World War History , 2016, ESWC.

[30]  M. de Rijke,et al.  The Impact of Semantic Document Expansion on Cluster-Based Fusion for Microblog Search , 2014, ECIR.

[31]  Max L. Wilson,et al.  New Directions in Information Behaviour , 2011 .

[32]  Korra Sathya Babu,et al.  Text Summarization with Automatic Keyword Extraction in Telugu e-Newspapers , 2018 .

[33]  Erhard Rahm,et al.  Generic Schema Matching with Cupid , 2001, VLDB.

[34]  Jian-Yun Nie,et al.  Query expansion using term relationships in language models for information retrieval , 2005, CIKM '05.

[35]  Avi Arampatzis,et al.  Unified Access to Heterogeneous Data in Cultural Heritage , 2007, RIAO.

[36]  O. Signore The Semantic Web and Cultural Heritage : Ontologies and Technologies Help in Accessing Museum Information , 2007 .

[37]  Wei-Ying Ma,et al.  Query Expansion by Mining User Logs , 2003, IEEE Trans. Knowl. Data Eng..

[38]  Preeti Pandey Information Retrieval Systems in XML Based Database - A review , 2012 .

[39]  Francesca Tomasi,et al.  Using Ontologies as a Faceted Browsing for Heterogeneous Cultural Heritage Collections , 2015, IT@LIA@AI*IA.

[40]  Gabriella Kazai,et al.  The Accessibility Dimension for Structured Document Retrieval , 2002, ECIR.

[41]  Paolo Nesi,et al.  Assessing Open Archive OAI-PMH implementations , 2010, DMS.

[42]  Nick Craswell,et al.  Query Expansion with Locally-Trained Word Embeddings , 2016, ACL.

[43]  Silvia Miksch,et al.  Reframing Cultural Heritage Collections in a Visualization Framework of Space-Time Cubes , 2016, HistoInformatics@DH.

[44]  Yong Zhang,et al.  Chinese Information Retrieval Based on Document Expansion , 2007, NTCIR.

[45]  Stephen E. Robertson,et al.  Effective site finding using link anchor information , 2001, SIGIR '01.

[46]  Holger Brocks,et al.  Customizable Retrieval Functions Based on User Tasks in the Cultural Heritage Domain , 2001, ECDL.

[47]  Xiaohua Hu,et al.  A Comparison of Local Analysis, Global Analysis and Ontology-based Query Expansion Strategies for Bio-medical Literature Search , 2006, 2006 IEEE International Conference on Systems, Man and Cybernetics.

[48]  Tom Evens,et al.  Challenges of digital preservation for cultural heritage institutions , 2011, J. Libr. Inf. Sci..

[49]  Nicola Ferro,et al.  Interacting with digital cultural heritage collections via annotations: the CULTURA approach , 2013, ACM Symposium on Document Engineering.

[50]  Ji-Rong Wen,et al.  Clustering user queries of a search engine , 2001, WWW '01.

[51]  Karen Coyle Chapter 1: Library Data in the Web World , 2010 .

[52]  William R. Hersh,et al.  Information Retrieval and Digital Libraries , 2014 .

[53]  G. Meera Gandhi,et al.  Wordnet and Ontology Based Query Expansion for Semantic Information Retrieval in Sports Domain , 2015, J. Comput. Sci..

[54]  Daniel Schwabe,et al.  A hybrid approach for searching in the semantic web , 2004, WWW '04.

[55]  Patrick N. Halpin,et al.  Geospatial web services within a scientific workflow: Predicting marine mammal habitats in a dynamic environment , 2007, Ecol. Informatics.

[56]  Peter Johan Lor,et al.  An ethical perspective on political-economic issues in the long-term preservation of digital heritage , 2012, J. Assoc. Inf. Sci. Technol..

[57]  Mustapha Baziz Indexation conceptuelle guidée par ontologie pour la recherche d'information , 2005 .

[58]  Mitra Akasereh,et al.  A quantitative evaluation of query expansion in domain specific information retrieval , 2013, ASIST.

[59]  Jian-Yun Nie,et al.  Smoothing document language model with local word graph , 2009, CIKM.

[60]  Mostafa Keikha,et al.  Automatic refinement of patent queries using concept importance predictors , 2012, SIGIR '12.

[61]  Luo Si,et al.  Learning for Efficient Supervised Query Expansion via Two-stage Feature Selection , 2016, SIGIR.

[62]  Giacomo Berardi,et al.  Metadata Enrichment Services for the Europeana Digital Library , 2012, TPDL.

[63]  Sophia Ananiadou,et al.  Enrichment and Structuring of Archival Description Metadata , 2011, LaTeCH@ACL.

[64]  Lloyd Sokvitne Manager An Evaluation of the Effectiveness of Current Dublin Core Metadata for Retrieval , 2000 .

[65]  Jean-Pierre Chevallet,et al.  Wikipedia-based semantic query enrichment , 2013, ESAIR '13.

[66]  C. D. Patricia Partridge The Role of Ontology in Integrating Semantically Heterogeneous Databases DB-Fusion 2002 , 2002 .

[67]  Gerhard Weikum,et al.  The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking , 2002, EDBT.

[68]  Chunyan Liang PERSONALIZED INFORMATION RETRIEVAL IN SPECIFIC DOMAIN , 2011 .

[69]  Douglas W. Oard,et al.  Adapting Morphology for Arabic Information Retrieval , 2007 .

[70]  Jean-Pierre Chevallet,et al.  Exploiting Wikipedia Structure for Short Query Expansion in Cultural Heritage , 2014, CORIA-CIFED.

[71]  Tao Tao,et al.  Language Model Information Retrieval with Document Expansion , 2006, NAACL.

[72]  Katrina Fenlon,et al.  Improving retrieval of short texts through document expansion , 2012, SIGIR '12.

[73]  Gabriella Kazai Initiative for the Evaluation of XML Retrieval , 2018, Encyclopedia of Database Systems.

[74]  Abdolreza Hajmoosaei,et al.  Museum Ontology-Based Metadata , 2016, 2016 IEEE Tenth International Conference on Semantic Computing (ICSC).

[75]  Sean Bechhofer,et al.  SKOS Simple Knowledge Organization System Reference , 2009 .

[76]  Claudio Carpineto,et al.  Improving retrieval feedback with multiple term-ranking function combination , 2002, TOIS.

[77]  筑波大学,et al.  Proceedings of International Symposium on Digital Libraries and Knowledge Communities in Networked Information Society 2004 : DLKC'04, March 2-5, 2004, Tsukuba, Ibaraki, Japan , 2004 .

[78]  Zuraidah Abd Manaf The state of digitisation initiatives by cultural institutions in Malaysia: An exploratory survey , 2007 .

[79]  Corinne Amel Zayani,et al.  Adaptive Global Schema Generation from Heterogeneous Metadata Schemas , 2015, KES.

[80]  Wolfgang Klas,et al.  A survey of techniques for achieving metadata interoperability , 2010, CSUR.

[81]  Eric Childress,et al.  Two paths to interoperable metadata , 2003 .

[82]  Yi Chen,et al.  Query Expansion Based on Clustered Results , 2011, Proc. VLDB Endow..

[83]  Amit P. Sheth,et al.  RQUERY: Rewriting Natural Language Queries on Knowledge Graphs to Alleviate the Vocabulary Mismatch Problem , 2017, AAAI.

[84]  Heiner Stuckenschmidt,et al.  Ontology-Based Integration of Information - A Survey of Existing Approaches , 2001, OIS@IJCAI.

[85]  Max L. Wilson,et al.  Chapter 9 Understanding Casual-Leisure Information Behaviour , 2011 .

[86]  Kaiming,et al.  Short Texts Classification Through Reference Document Expansion , 2014 .

[87]  Mark Stevenson,et al.  User-Centred Design to Support Exploration and Path Creation in Cultural Heritage Collections , 2012, EuroHCIR.

[88]  S. S. Alonso,et al.  Exploring the Relevance of Europeana Digital Resources: Preliminary Ideas on Europeana Metadata Quality , 2017 .

[89]  Ulrich Thiel,et al.  How to Incorporate Collaborative Discourse in Cultural Digital Libraries , 2002, SAAKM@ECAI.

[90]  Wolfgang Nejdl,et al.  How to Search the Internet Archive Without Indexing It , 2016, TPDL.

[91]  Masatoshi Yoshikawa,et al.  Information Retrieval System for XML Documents , 2002, DEXA.

[92]  Gerard Salton,et al.  Improving Retrieval Performance by Relevance Feedback , 1997 .

[93]  Aditi Sharan,et al.  THESAURUS AND QUERY EXPANSION , 2009 .

[94]  Ryosuke Yamanishi,et al.  Interactive Document Expansion for Answer Extraction of Question Answering System , 2013, KES.

[95]  May Sabai Han Semantic Information Retrieval based on Wikipedia Taxonomy , 2012 .

[96]  James P. Callan,et al.  Query Expansion with Freebase , 2015, ICTIR.

[97]  Bharat Chaudhari,et al.  A Comparative Study of clustering algorithms Using weka tools , 2012 .

[98]  Shojiro Nishio,et al.  N-gram IDF: A Global Term Weighting Scheme Based on Information Distance , 2015, WWW.

[99]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[100]  Daniela Canali Cultural Heritage Information Access and management , 2015 .

[101]  Aviezri S. Fraenkel,et al.  Local Feedback in Full-Text Retrieval Systems , 1977, JACM.

[102]  Marcos André Gonçalves,et al.  A digital library environment for integrating, disseminating and exploring ecological data , 2008, Ecol. Informatics.

[103]  Manolis Gergatsoulis,et al.  Mapping Cultural Metadata Schemas to CIDOC Conceptual Reference Model , 2010, SETN.

[104]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.

[105]  M. C. Wijegunasekara,et al.  Comparison of major clustering algorithms using Weka tool , 2014, 2014 14th International Conference on Advances in ICT for Emerging Regions (ICTer).

[106]  Lais Barbudo Carrasco,et al.  Information Integration: Mapping Cultural Heritage Metadata into CIDOC CRM. , 2013 .

[107]  Martha Brogan Digital Aggregation Services , 2003 .

[108]  PANTELIS LILIS A METADATA MODEL FOR REPRESENTING TIME-DEPENDENT INFORMATION IN CULTURAL COLLECTIONS , 2005 .

[109]  Steffen Hennicke,et al.  Representation of Archival User Needs using CIDOC CRM , 2013, CRMEX@TPDL.

[110]  Iadh Ounis,et al.  Studying Query Expansion Effectiveness , 2009, ECIR.

[111]  Hsinchun Chen,et al.  Medical Informatics: Knowledge Management and Data Mining in Biomedicine (Operations Research/Computer Science Interfaces) , 2005 .

[112]  Utpal Garain,et al.  Using Word Embeddings for Automatic Query Expansion , 2016, ArXiv.

[114]  Amanda Spink Web Search: Emerging Patterns , 2003, Libr. Trends.

[115]  Oren Kurland,et al.  Query Expansion Using Word Embeddings , 2016, CIKM.

[116]  Philippe Mulhem,et al.  Integrating Semantic Term Relations into Information Retrieval Systems Based on Language Models , 2014, AIRS.

[117]  Vincent P. Wade,et al.  Personalised Information Retrieval: survey and classification , 2013, User Modeling and User-Adapted Interaction.

[118]  W. Bruce Croft,et al.  Query expansion using local and global document analysis , 1996, SIGIR '96.

[119]  Werner Bailer,et al.  A metadata model and mapping approach for facilitating access to heterogeneous cultural heritage assets , 2015, International Journal on Digital Libraries.

[120]  Ching-chih Chen,et al.  Enhanced perspectives for historical and cultural documentaries using informedia technologies , 2002, JCDL '02.

[121]  Shaofeng Liu,et al.  A review of structured document retrieval (SDR) technology to improve information access performance in engineering document management , 2008, Comput. Ind..

[122]  Gabriella Kazai,et al.  Focussed Structured Document Retrieval , 2002, SPIRE.

[123]  W. Bruce Croft,et al.  Quary Expansion Using Local and Global Document Analysis , 1996, SIGIR Forum.

[124]  Gareth J. F. Jones,et al.  Query Expansion for Language Modeling Using Sentence Similarities , 2011, IRFC.

[125]  Stefan Schlobach,et al.  Semantic Web Techniques for Multiple Views on Heterogeneous Collections: A Case Study , 2006, ECDL.

[126]  Gregory Grefenstette,et al.  Transforming Wikipedia into an Ontology-based Information Retrieval Search Engine for Local Experts using a Third-Party Taxonomy , 2015 .

[127]  Heiko Schuldt,et al.  The Delos digital library reference model : foundations for digital libraries , 2007 .

[128]  Michael Uschold,et al.  Ontologies: principles, methods and applications , 1996, The Knowledge Engineering Review.

[129]  Mounia Lalmas,et al.  Advances in XML Information Retrieval: Third International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2004, Dagstuhl Castle, ... 2004 (Lecture Notes in Computer Science) , 2005 .

[130]  A. Townsend Peterson,et al.  A global distributed biodiversity information network: building the world museum , 2003 .

[131]  Sihem Amer-Yahia,et al.  Texquery: a full-text search extension to xquery , 2004, WWW '04.

[132]  Frank van Harmelen,et al.  A semantic web primer , 2004 .

[133]  Martin Doerr,et al.  Semantic Integration of Collection Description: Combining CIDOC/CRM and Dublin Core Collections Application Profile , 2009, D-Lib Magazine.

[134]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[135]  Vincent P. Wade,et al.  CULTURA: A Metadata-Rich Environment to Support the Enhanced Interrogation of Cultural Collections , 2012, MTSR.

[136]  Amit Singhal,et al.  Document expansion for speech retrieval , 1999, SIGIR '99.

[137]  Abdullah Zawawi Talib,et al.  Document Expansion Method for Digital Resource Objects , 2019, 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT).

[138]  Dmitry Mouromtsev,et al.  Towards the Russian Linked Culture Cloud: Data Enrichment and Publishing , 2015, ESWC.

[139]  Andrei Broder,et al.  A taxonomy of web search , 2002, SIGF.

[140]  Jianqiang Wang,et al.  CLEF-2005 CL-SR at Maryland: Document and Query Expansion using Side Collections and Thesauri , 2005, CLEF.

[141]  Ya-Ning Chen,et al.  A Semantic Web Approach to Heterogeneous Metadata Integration , 2010, ICCCI.

[142]  James Allan,et al.  A cluster-based resampling method for pseudo-relevance feedback , 2008, SIGIR '08.

[143]  Feng He,et al.  Heterogeneous information integration for supply chain systems , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[144]  Karen Sparck Jones,et al.  Spoken Document Retrieval for TREC-8 at Cambridge University , 1998, TREC.

[145]  Yiannis Kompatsiaris,et al.  A hybrid ontology and visual-based retrieval model for cultural heritage multimedia collections , 2008, Int. J. Metadata Semant. Ontologies.

[146]  Babajide Olakunle Afeni,et al.  A Full Text Retrieval System in a Digital Library Environment , 2016 .

[147]  J. Kamps,et al.  Information Retrieval in Cultural Heritage , 2009 .

[148]  Giorgos B. Stamou,et al.  A systemic approach for effective semantic access to cultural content , 2012, Semantic Web.

[149]  Fabio Vitali,et al.  Enhancing Semantic Expressivity in the Cultural Heritage Domain , 2016, ACM Journal on Computing and Cultural Heritage.

[150]  Rita Almeida Ribeiro,et al.  Automatic Extraction of Document Topics , 2011, DoCEIS.

[151]  Heung-Seon Oh,et al.  Cluster-based query expansion using external collections in medical information retrieval , 2015, J. Biomed. Informatics.

[152]  Mark M. Hall,et al.  Just Looking Around: Supporting Casual Users Initial Encounters with Digital Cultural Heritage , 2015, SCST@ECIR.

[153]  Martin Doerr,et al.  Integrating Dublin Core Metadata for Cultural Heritage Collections Using Ontologies , 2007, Dublin Core Conference.

[154]  Gareth J. F. Jones,et al.  An LDA-smoothed relevance model for document expansion: a case study for spoken document retrieval , 2013, SIGIR.

[155]  LorPeter Johan An ethical perspective on political-economic issues in the long-term preservation of digital heritage , 2012 .

[156]  Nish Parikh,et al.  On segmentation of eCommerce queries , 2013, CIKM.

[157]  Tobun Dorbin Ng,et al.  Enriching Perspectives in Exploring Cultural Heritage Documentaries Using Informedia Technologies , 2002 .

[158]  Emma L. Tonkin,et al.  Using the crowd to update cultural heritage catalogues. Paper presented at Involving the CROWD in future MUSEUM experience design , 2016 .

[159]  Xuheng Xu,et al.  Cluster-based query expansion using language modeling in the biomedical domain , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[160]  Dr. P. Suresh,et al.  Concept Based Query Expansion and Cluster Based Feature Selection for Information Retrieval , 2022 .

[161]  Lixin Gan,et al.  Improving Query Expansion for Information Retrieval Using Wikipedia , 2015 .

[162]  Fabio Vitali,et al.  The aggregation of heterogeneous metadata in web-based cultural heritage collections: a case study , 2013, Int. J. Web Eng. Technol..

[163]  Mounia Lalmas,et al.  Best entry points for structured document retrieval - Part I: Characteristics , 2006, Inf. Process. Manag..

[164]  William Y. Arms Key concepts in the architecture of the digital library , 1995, D Lib Mag..

[165]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .