We propose a kernel method for using combinations of features across example pairs in learning pairwise classifiers. Pairwise classifiers, which identify whether two examples belong to the same class or not, are important components in duplicate detection, entity matching, and other clustering applications. Existing methods for learning pairwise classifiers from labeled training data are based on string edit distance or common features between two examples. However, if two examples from the same class have few common features, these methods have difficulties in finding these pairs and achieving high recall. One typical example is to check whether two abbreviated author names in different citations refer to the same person or not. Since similarities between examples from the same class become close to zero, classifiers fail to distinguish positive pairs from negative pairs. One approach to avoiding the problem of zero similarities is using conjunctions of different features across examples, but implementing this idea straightforwardly makes the computational cost prohibitive for practical problems. Using a kernel on pair instances, our method can use feature conjunctions across examples without actually doing feature mappings, which are computationally expensive. The kernel is a tensor product of two inner products on the original feature space. The corresponding feature mapping generates conjunctions of features only across the two different examples while that of the conventional polynomial kernel also generates conjunctions of features from the same example, which are irrelevant to pairwise classification and cause deterioration of accuracy. Our experiments on the author matching problem show that this method can give a precision 4 to 8 times higher than that of previous methods at medium recall levels.
[1]
David J. Crisp,et al.
Uniqueness of the SVM Solution
,
1999,
NIPS.
[2]
Mikhail Bilenko and Raymond J. Mooney,et al.
On Evaluation and Training-Set Construction for Duplicate Detection
,
2003
.
[3]
Nello Cristianini,et al.
Kernel Methods for Pattern Analysis
,
2003,
ICTAI.
[4]
Raymond J. Mooney,et al.
Adaptive duplicate detection using learnable string similarity measures
,
2003,
KDD '03.
[5]
Craig A. Knoblock,et al.
Learning domain-independent string transformation weights for high accuracy object identification
,
2002,
KDD.
[6]
C. Lee Giles,et al.
Autonomous citation matching
,
1999,
AGENTS '99.
[7]
Zhihua Zhang,et al.
Learning Metrics via Discriminant Kernels and Multidimensional Scaling: Toward Expected Euclidean Representation
,
2003,
ICML.
[8]
Alexander J. Smola,et al.
Learning with Kernels: support vector machines, regularization, optimization, and beyond
,
2001,
Adaptive computation and machine learning series.
[9]
Anuradha Bhamidipaty,et al.
Interactive deduplication using active learning
,
2002,
KDD.
[10]
Thomas S. Morton,et al.
Coreference for NLP Applications
,
2000,
ACL.
[11]
Pradeep Ravikumar,et al.
Adaptive Name Matching in Information Integration
,
2003,
IEEE Intell. Syst..