Identification of FRBR Works Within Bibliographic Databases: An Experiment with UNIMARC and Duplicate Detection Techniques

Many experiments and studies have been conducted on the application of FRBR as an implementation model for bibliographic databases, in order to improve the services of resource discovery and transmit better perception of the information spaces represented in catalogues. One of these applications is the attempt to identify the FRBR work instances shared by several bibliographic records. In our work we evaluate the applicability to this problem of techniques based on string similarity, used in duplicate detection procedures mainly by the database research community. We describe the particularities of the application of these techniques to bibliographic data, and empirically compare the results obtained with these techniques to those obtained by current techniques, which are based on exact matching. Experiments performed on the Portuguese national union catalogue show a significant improvement over currently used approaches.

[1]  Konrad Saur,et al.  IFLA Study Group on the Functional Requirements for Bibliographic Records , 1998 .

[2]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[3]  Sudha Ram,et al.  Entity identification for heterogeneous database integration--a multiple classifier system approach and empirical evaluation , 2005, Inf. Syst..

[4]  Jane Hunter,et al.  Digital Libraries: Achievements, Challenges and Opportunities, 9th International Conference on Asian Digital Libraries, ICADL 2006, Kyoto, Japan, November 27-30, 2006, Proceedings , 2006, International Conference on Asian Digital Libraries.

[5]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[6]  Jean-Raymond Abrial,et al.  On B , 1998, B.

[7]  Stuart J. Russell,et al.  Identity Uncertainty and Citation Matching , 2002, NIPS.

[8]  Pradeep Ravikumar,et al.  Adaptive Name Matching in Information Integration , 2003, IEEE Intell. Syst..

[9]  C. Lee Giles,et al.  Autonomous citation matching , 1999, AGENTS '99.

[10]  Ahmed K. Elmagarmid,et al.  Duplicate Record Detection: A Survey , 2007, IEEE Transactions on Knowledge and Data Engineering.

[11]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[12]  Trond Aalberg A Process and Tool for the Conversion of MARC Records to a Normalized FRBR Implementation , 2006, ICADL.

[13]  Max Kaiser,et al.  New Ways of Sharing and Using Authority Information: The LEAF Project , 2003, D Lib Mag..

[14]  Thomas B. Hickey,et al.  Experiments with the IFLA Functional Requirements for Bibliographic Records (FRBR) , 2002, D Lib Mag..

[15]  Huimin Zhao,et al.  Semantic matching across heterogeneous data sources , 2007, Commun. ACM.

[16]  Byung-Won On,et al.  Effective and scalable solutions for mixed and split citation problems in digital libraries , 2005, IQIS '05.