We introduce ClueMaker, the first language designed specifically for approximate record matching. Clues written in ClueMaker predict whether two records denote the same thing based on the values of the records’ attributes. For example, a clue may predict match if the records have identical values for the first name attribute. The values of the clues can then be used as input to a machine-learning technique to compute a match probability. ClueMaker is based on Java and is compiled to Java source or byte code. Therefore, ClueMaker is easily accessible to many programmers, allows the integration of any Java class, runs on virtually any platform, supports UNICODE, and is more easily accepted by IT departments who try to minimize the number of distinct languages in use. ChoiceMaker Technologies has used ClueMaker successfully over the past two years in a variety of approximate record matching tasks.
[1]
Ralph Grishman,et al.
A Maximum Entropy Approach to Named Entity Recognition
,
1999
.
[2]
Adam L. Berger,et al.
A Maximum Entropy Approach to Natural Language Processing
,
1996,
CL.
[3]
Nello Cristianini,et al.
An Introduction to Support Vector Machines and Other Kernel-based Learning Methods
,
2000
.
[4]
Guy L. Steele,et al.
The Java Language Specification
,
1996
.
[5]
David Thomas,et al.
The Art in Computer Programming
,
2001
.