Charting the digital library evaluation domain with a semantically enhanced mining methodology

The digital library evaluation field has an evolving nature and it is characterized by a noteworthy proclivity to enfold various methodological orientations. Given the fact that the scientific literature in the specific domain is vast, researchers require tools that will exhibit either commonly acceptable practices, or areas for further investigation. In this paper, a data mining methodology is proposed to identify prominent patterns in the evaluation of digital libraries. Using Machine Learning techniques, all papers presented in the ECDL and JCDL conferences between the years 2001 and 2011 were categorized as relevant or non-relevant to the DL evaluation domain. Then, the relevant papers were semantically annotated according to the Digital Library Evaluation Ontology (DiLEO) vocabulary. The produced set of annotations was clustered to evaluation patterns for the most frequently used tools, methods and goals of the domain. Our findings highlight the expressive nature of DiLEO, place emphasis on semantic annotation as a necessary step in handling domain-centric corpora and underline the potential of the proposed methodology in the profiling of evaluation activities.

[1]  G. N. Lance,et al.  A General Theory of Classificatory Sorting Strategies: 1. Hierarchical Systems , 1967, Comput. J..

[2]  Michalis Sfakakis,et al.  Mining Digital Library Evaluation Patterns Using a Domain Ontology , 2012, OTM Workshops.

[3]  Pedro M. Domingos,et al.  On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , 1997, Machine Learning.

[4]  Matthew S. Mayernik,et al.  From artifacts to aggregations: Modeling scientific life cycles on the semantic Web , 2009, J. Assoc. Inf. Sci. Technol..

[5]  D. Kell,et al.  Calling International Rescue: knowledge lost in literature and data landslide! , 2009, The Biochemical journal.

[6]  Armand Brahaj,et al.  Ontological Formalization of Scientific Experiments Based on Core Scientific Metadata Model , 2012, TPDL.

[7]  Timos K. Sellis,et al.  GoNTogle: A Tool for Semantic Annotation and Search , 2010, ESWC.

[8]  C. Krishnaveni,et al.  On the Classification of Imbalanced Datasets , 2022 .

[9]  Henning Hopf Knowledge lost in information , 2007 .

[10]  Rob W.W. Hooft,et al.  The value of data , 2011, Nature Genetics.

[11]  Angus Roberts,et al.  The CLEF Corpus: Semantic Annotation of Clinical Text , 2007, AMIA.

[12]  Giannis Tsakonas,et al.  An exploration of the digital library evaluation literature based on an ontological representation , 2013, J. Assoc. Inf. Sci. Technol..

[13]  Tefko Saracevic 1 – Introduction: the framework for digital library evaluation , 2009 .

[14]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[15]  Giannis Tsakonas,et al.  An exploration of the research trends in the digital library evaluation domain , 2012, JCDL '12.

[16]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[17]  Giannis Tsakonas,et al.  An ontological representation of the digital library evaluation domain , 2011, J. Assoc. Inf. Sci. Technol..

[18]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[19]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[20]  Paul T. Groth,et al.  The anatomy of a nanopublication , 2010, Inf. Serv. Use.

[21]  Guilherme Horta Travassos,et al.  Scientific research ontology to support systematic review in software engineering , 2007, Adv. Eng. Informatics.

[22]  I. Tomek,et al.  Two Modifications of CNN , 1976 .

[23]  H. S. Sheshadri,et al.  On the Classification of Imbalanced Datasets , 2012 .

[24]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[25]  Christina A. Christie,et al.  Insight Into Evaluation Practice: A Content Analysis of Designs and Methods Used in Evaluation Studies Published in North American Evaluation-Focused Journals , 2010 .

[26]  Ross D King,et al.  An ontology of scientific experiments , 2006, Journal of The Royal Society Interface.

[27]  Von-Wun Soo,et al.  Ontology acquisition and semantic retrieval from semantic annotated chinese poetry , 2004, JCDL.

[28]  Cassidy R. Sugimoto,et al.  A systematic review of interactive information retrieval evaluation studies, 1967-2006 , 2013, J. Assoc. Inf. Sci. Technol..