High-Performance Annotation Tagging over Solr Full-text Indexes

In this work, we focus on the problem of “annotation tagging” over Information  Spaces of objects stored in a full-text index. In such a scenario, tags are assigned to objects by “data curator” users with the purpose of classification, while generic end-users will perceive tags as searchable and browsable object properties. To carry out their activities, data curators need “annotation tagging tools” which allow them to “bulk” tag or untag large sets of objects in temporary work sessions, where they can “virtually” and in “real-time” experiment the effect of their actions before making the changes visible to end-users. The implementation of these tools over full-text indexes is a challenge, since bulk object updates in this context are far from being real-time and in critical cases may slow down index performance. We devised TagTick, a tool which offers to data curators a fully functional annotation tagging environment over the full-text index Apache Solr, regarded as a “de-facto standard” in this area. TagTick consists of a TagTick Virtualizer module, which extends the APIs of Solr to support real-time, virtual, bulk-tagging operations, and a TagTick User Interface module, which offers end-user functionalities for annotation tagging. The tool scales optimally with the number and size of bulk tag operations, without compromising index performance.

[1]  Ioannis Konstantinou,et al.  Efficient Updates for Web-Scale Indexes over the Cloud , 2012, 2012 IEEE 28th International Conference on Data Engineering Workshops.

[2]  Jennifer Trant,et al.  Studying Social Tagging and Folksonomy: A Review and Framework , 2009, J. Digit. Inf..

[3]  Douglas Tudhope,et al.  Towards Digital Repository Interoperability: The Document Indexing and Semantic Tagging Interface for Libraries (DISTIL) , 2012, TPDL.

[4]  Bingbing Ni,et al.  Assistive tagging: A survey of multimedia tagging with human-computer joint exploration , 2012, CSUR.

[5]  Diana Maynard,et al.  Large Scale Semantic Annotation, Indexing and Search at The National Archives , 2012, LREC.

[6]  Malika Mahoui,et al.  Collaborative Tagging of Art Digital Libraries: Who Should Be Tagging? - A Case Study , 2012, TPDL.

[7]  Andrea Resmini,et al.  Information architecture: Facetag: Integrating bottom‐up and top‐down classification in a social tagging system , 2008 .

[8]  Lora Aroyo,et al.  Semantic annotation and search of cultural-heritage collections: The MultimediaN E-Culture demonstrator , 2008, J. Web Semant..

[9]  Stijn Christiaens,et al.  Metadata Mechanisms: From Ontology to Folksonomy ... and Back , 2006, OTM Workshops.

[10]  Ivor W. Tsang,et al.  Tag-based web photo retrieval improved by batch mode re-tagging , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11]  Paolo Manghi,et al.  DRIVER: Building the Network for Accessing Digital Repositories Across Europe , 2007 .

[12]  Arkaitz Zubiaga,et al.  Tags vs shelves: from social tagging to social classification , 2011, HT '11.

[13]  Harry Bruce,et al.  Better to organize personal information by folders or by tags?: The devil is in the details , 2008, ASIST.

[14]  Heiko Schuldt,et al.  Setting the Foundations of Digital Libraries: The DELOS Manifesto , 2007, D Lib Mag..

[15]  Doug Tudhope,et al.  Tagging behaviour with support from controlled vocabulary , 2012 .

[16]  Alexandre Passant,et al.  Meaning Of A Tag: A collaborative approach to bridge the gap between tagging and Linked Data , 2008, LDOW.

[17]  Beng Chin Ooi,et al.  TI: an efficient indexing mechanism for real-time search on tweets , 2011, SIGMOD '11.

[18]  David Smiley,et al.  Apache Solr 4 Enterprise Search Server , 2015 .

[19]  Paolo Manghi,et al.  OpenAIREplus: the European Scholarly Communication Data Infrastructure , 2012, D Lib Mag..

[20]  Mor Naaman,et al.  HT06, tagging paper, taxonomy, Flickr, academic article, to read , 2006, HYPERTEXT '06.