Using Feature Conjunctions Across Examples for Learning Pairwise Classifiers

We propose a kernel method for using combinations of features across example pairs in learning pairwise classifiers. Identifying two instances in the same class is an important technique in duplicate detection, entity matching, and other clustering problems. However, it is a difficult problem when instances have few discriminative features. One typical example is to check whether two abbreviated author names in different papers refer to the same person or not. While using combinations of different features from each instance may improve the classification accuracy, doing this straightforwardly is computationally intensive. Our method uses interaction between different features without high computational cost using a kernel. At medium recall levels, this method can give a precision 4 to 8 times higher than that of previous methods in author matching problems.

[1]  Andrew McCallum,et al.  Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference , 2003, IIWeb.

[2]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[3]  Mikhail Bilenko and Raymond J. Mooney,et al.  On Evaluation and Training-Set Construction for Duplicate Detection , 2003 .

[4]  V. S. Subrahmanian,et al.  Maintaining views incrementally , 1993, SIGMOD Conference.

[5]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[6]  C. Lee Giles,et al.  Autonomous citation matching , 1999, AGENTS '99.

[7]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[8]  Pradeep Ravikumar,et al.  Adaptive Name Matching in Information Integration , 2003, IEEE Intell. Syst..

[9]  Alexander J. Smola,et al.  Learning with Kernels: support vector machines, regularization, optimization, and beyond , 2001, Adaptive computation and machine learning series.

[10]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[11]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[12]  Kenneth A. Ross,et al.  Adapting materialized views after redefinitions , 1995, SIGMOD '95.

[13]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[14]  Nitesh V. Chawla,et al.  SPECIAL ISSUE ON LEARNING FROM IMBALANCED DATA SETS , 2004 .

[15]  Zhihua Zhang,et al.  Learning Metrics via Discriminant Kernels and Multidimensional Scaling: Toward Expected Euclidean Representation , 2003, ICML.

[16]  Ashish Gupta,et al.  Aggregate-Query Processing in Data Warehousing Environments , 1995, VLDB.

[17]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[18]  Thomas S. Morton,et al.  Coreference for NLP Applications , 2000, ACL.

[19]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[20]  Craig A. Knoblock,et al.  Learning domain-independent string transformation weights for high accuracy object identification , 2002, KDD.

[21]  Anuradha Bhamidipaty,et al.  Interactive deduplication using active learning , 2002, KDD.

[22]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .