An event is a short and data-rich document and it refers to an instance of an announcement type such as “wedding”, "birth", "graduation", "auction", "obituary", “divorce”, etc. In this research we focus on the events reported in newspapers. To obtain data from an event three steps are involved: (1) Obtaining a set of events for a given announcement from a group of newspapers, (2) extracting the features (data) of each event and build event records, and (3) approximate matching of the event records to an existing customer database. We have completed the first and the second steps and reported previously. The completion of the third step is the focus of this paper. The approximate matching scheme that is introduced in this paper is a weight-based scheme in which the degree of memberships for specific attribute values of an event record partially influence the discrimination among the candidate records. This work is an exploratory study that constitutes the last part of a larger research project. Although the data sample for testing the proposed scheme is small, the results reveal the fact that proposed scheme for approximate matching is an effective one.
[1]
Ray R. Hashemi,et al.
Extraction of Features with Unstructured Representation from HTML Documents
,
2002,
ICWI.
[2]
Paul Douglas,et al.
International Conference on Information Technology : Coding and Computing
,
2003
.
[3]
George V. Moustakides,et al.
A Bayesian decision model for cost optimal record matching
,
2003,
The VLDB Journal.
[4]
Chia-Chu Chiang,et al.
Confidence on approximate query in large datasets
,
2004,
International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..
[5]
Andrew Borthwick,et al.
ClueMaker: A Language for Approximate Record Matching
,
2003,
ICIQ.