Unifying the Concept of Collection in Digital Libraries

The notion of collection plays a key role in Digital Libraries, where several kinds of collections are typically found. We claim that all these kinds can be unified into a single abstraction mechanism, endowed with an extension and an intension, similarly to predicates in logic. The extension of a collection is the set of documents that are members of the collection at a given point in time, while the intension is a description of the meaning of the collection, that is the peculiar property that the members of the collection possess and that distinguishes the collection from other collections. The problem then arises how to automatically derive the intension from a given extension, a problem that must be solved e.g. for the creation of a collection from a set of documents. It turns out that our notion of collection is very close to the notion of formal concept in Formal Concept Analysis, which provides a well-founded framework to formalize the problem and very useful tools to solve it. We exploit this framework to study the problem of automatically deriving a collection intension from a given extension.We then show how intensions can be exploited for carrying out basic tasks on collections, establishing a connection between Digital Library management and data integration.

[1]  Claudio Carpineto,et al.  A Lattice Conceptual Clustering System and Its Application to Browsing Retrieval , 1996, Machine Learning.

[2]  Heiko Schuldt,et al.  The Delos digital library reference model : foundations for digital libraries , 2007 .

[3]  Maurizio Lenzerini,et al.  Data integration: a theoretical perspective , 2002, PODS.

[4]  Bernhard Ganter,et al.  Applied lattice theory: formal concept and analysis , 1997 .

[5]  James P. Callan,et al.  Query-based sampling of text databases , 2001, TOIS.

[6]  Nicolas Spyratos,et al.  Preference-Based Query Tuning Through Refinement/Enlargement in a Formal Context , 2006, FoIKS.

[7]  Claudio Carpineto,et al.  Information retrieval through hybrid navigation of lattice representations , 1996, Int. J. Hum. Comput. Stud..

[8]  Ian H. Witten,et al.  Proceedings of the third ACM conference on Digital libraries , 1998 .

[9]  Jian Xu,et al.  Database selection techniques for routing bibliographic queries , 1998, DL '98.

[10]  Umberto Straccia,et al.  The Personalized, Collaborative Digital Library Environment CYCLADES and Its Collections Management , 2003, Distributed Multimedia Information Retrieval.

[11]  Dik Lun Lee,et al.  Server Ranking for Distributed Text Retrieval Systems on the Internet , 1997, DASFAA.

[12]  Alon Y. Halevy,et al.  Answering queries using views: A survey , 2001, The VLDB Journal.

[13]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[14]  James C. French,et al.  Comparing the performance of database selection algorithms , 1999, SIGIR '99.

[15]  Edward A. Fox,et al.  Streams, structures, spaces, scenarios, societies (5s): A formal model for digital libraries , 2004, TOIS.

[16]  James P. Callan,et al.  The robustness of content-based search in hierarchical peer to peer networks , 2004, CIKM '04.

[17]  Nicolas Spyratos,et al.  Synthesizing Monadic Predicates , 2008, J. Log. Comput..

[18]  Robert Godin,et al.  Design of a browsing interface for information retrieval , 1989, SIGIR '89.

[19]  Nicolas Spyratos,et al.  Computing Intensions of Digital Library Collections , 2007, ICFCA.

[20]  Thomas Lukasiewicz Proceedings of the 7th International Symposium on the Foundations of Information and Knowledge Systems‚ FoIKS 2012‚ Kiel‚ Germany‚ March 5−9‚ 2012 , 2000 .

[21]  Stuart Macdonald,et al.  User Engagement in Research Data Curation , 2009, ECDL.

[22]  Uta Priss,et al.  Lattice-based information retrieval , 2000 .

[23]  Gary Geisler,et al.  Creating virtual collections in digital libraries: benefits and implementation issues , 2002, JCDL '02.

[24]  Diego Calvanese,et al.  The Description Logic Handbook , 2007 .

[25]  Claudio Carpineto,et al.  Effective Reformulation of Boolean Queries with Concept Lattices , 1998, FQAS.

[26]  Claudio Carpineto,et al.  Order-theoretical ranking , 2000 .

[27]  Fabio Crestani,et al.  Distributed Multimedia Information Retrieval , 2003, Lecture Notes in Computer Science.

[28]  Carl Lagoze,et al.  Defining Collections in Distributed Digital Libraries , 1998, D-Lib Magazine.

[29]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[30]  Pasquale Pagano,et al.  A Service for Supporting Virtual Views of Large Heterogeneous Digital Libraries , 2003, ECDL.

[31]  François Goasdoué,et al.  Distributed Reasoning in a Peer-to-Peer Setting , 2004, ECAI.

[32]  Ian H. Witten,et al.  Power to the people: end-user building of digital library collections , 2001, JCDL '01.

[33]  David C. Blair The challenge of commercial document retrieval, Part II: a strategy for document searching based on identifiable document partitions , 2002, Inf. Process. Manag..

[34]  Donna Bergmark,et al.  Collection synthesis , 2002, JCDL '02.