Overview of Record Linkage and Current Research Directions
暂无分享,去创建一个
[1] W. Deming,et al. On a Method of Estimating Birth and Death Rates and the Extent of Registration (Excerpt) , 1949 .
[2] Samuel B. Williams,et al. ASSOCIATION FOR COMPUTING MACHINERY , 2000 .
[3] W. Deming,et al. On the Problem of Matching Lists by Samples , 1959 .
[4] H B NEWCOMBE,et al. Automatic linkage of vital records. , 1959, Science.
[5] Howard B. Newcombe,et al. Record linkage: making maximum use of the discriminating power of identifying information , 1962, CACM.
[6] H. Newcombe,et al. Methods for Computer Linkage of Hospital Admission-Separation Records into Cumulative Health Histories , 1975, Methods of Information in Medicine.
[7] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[8] William S. Cooper,et al. Foundations of Probabilistic and Utility-Theoretic Indexing , 1978, JACM.
[9] Antonio Zamora,et al. Automatic spelling correction in scientific and scholarly text , 1984, CACM.
[10] Howard B. Newcombe,et al. Handbook of record linkage: methods for health and statistical studies, administration, and business , 1988 .
[11] Matthew A. Jaro,et al. Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .
[12] William E. Winkler,et al. Frequency-Based Matching in the Fellegi-Sunter Model of Record Linkage , 1989 .
[13] William E. Winkler. On Dykstra's Iterative Fitting Procedure , 1990 .
[14] William E. Winkler,et al. String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. , 1990 .
[16] Fritz Scheuren,et al. Regression Analysis of Data Files that Are Computer Matched , 1993 .
[17] W. Winkler. IMPROVED DECISION RULES IN THE FELLEGI-SUNTER MODEL OF RECORD LINKAGE , 1993 .
[18] William E. Winkler,et al. Advanced Methods For Record Linkage , 1994 .
[19] Salvatore J. Stolfo,et al. The merge/purge problem for large databases , 1995, SIGMOD '95.
[20] D. Rubin,et al. A method for calibrating false-match rates in record linkage , 1995 .
[21] Yoav Freund,et al. Experiments with a New Boosting Algorithm , 1996, ICML.
[22] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.
[23] William E. Winkler,et al. Approximate String Comparison and its Effect on an Advanced Record Linkage System , 1997 .
[24] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..
[25] Vladimir Cherkassky,et al. The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.
[26] Edward H. Porter,et al. Approximate String Comparison and its Effect , 1997 .
[27] L. Sweeney. Computational Disclosure Control for Medical Microdata , 1997 .
[28] Peter N. Yianilos,et al. Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..
[29] William E. Winkler,et al. Re-identification Methods for Evaluating the Confidentiality of Analytically Valid Microdata , 1998 .
[30] Avi Pfeffer,et al. Probabilistic Frame-Based Systems , 1998, AAAI/IAAI.
[31] R. Tibshirani,et al. Additive Logistic Regression : a Statistical View ofBoostingJerome , 1998 .
[32] Roberto Grossi,et al. The string B-tree: a new data structure for string search in external memory and its applications , 1999, JACM.
[33] William E. Winkler,et al. The State of Record Linkage and Current Research Problems , 1999 .
[34] Yaacov Ritov,et al. Tracking Many Objects with Many Sensors , 1999, IJCAI.
[35] W. Winkler. USING THE EM ALGORITHM FOR WEIGHT COMPUTATION IN THE FELLEGI-SUNTER MODEL OF RECORD LINKAGE , 2000 .
[36] Vladimir N. Vapnik,et al. The Nature of Statistical Learning Theory, Second Edition , 2000, Statistics for Engineering and Information Science.
[37] J. Friedman. Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .
[38] Andrew McCallum,et al. Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.
[39] William E. Yancey. Frequency-Dependent Probability Measures for Record Linkage , 2000 .
[40] W. Winkler. Machine Learning , Information Retrieval , and Record Linkage , 2000 .
[41] Erhard Rahm,et al. Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..
[42] William W. Cohen,et al. Learning to Match and Cluster Entity Names , 2001 .
[43] William E. Winkler. Quality of Very Large Databases , 2001 .
[44] Ben Taskar,et al. Probabilistic Classification and Clustering in Relational Data , 2001, IJCAI.
[45] Sunita Sarawagi,et al. Automatic segmentation of text into structured records , 2001, SIGMOD '01.
[46] Luis Gravano,et al. Approximate String Joins in a Database (Almost) for Free , 2001, VLDB.
[47] Stuart J. Russell,et al. Approximate inference for first-order probabilistic languages , 2001, IJCAI.
[48] Michael I. Jordan,et al. On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.
[49] Craig A. Knoblock,et al. Learning object identification rules for information integration , 2001, Inf. Syst..
[50] D. Rubin,et al. Iterative Automated Record Linkage Using Mixture Models , 2001 .
[51] Gonzalo Navarro,et al. A guided tour to approximate string matching , 2001, CSUR.
[52] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[53] Anuradha Bhamidipaty,et al. Interactive deduplication using active learning , 2002, KDD.
[54] Stuart J. Russell,et al. Identity Uncertainty and Citation Matching , 2002, NIPS.
[55] P. Lahiri,et al. MODEL-BASED ANALYSIS OF RECORDS LINKED USING MIXTURE MODELS , 2002 .
[56] Peter Christen,et al. Preparation of name and address data for record linkage using hidden Markov models , 2002, BMC Medical Informatics Decis. Mak..
[57] Fabrizio Sebastiani,et al. Machine learning in automated text categorization , 2001, CSUR.
[58] Peter Christen,et al. Probabilistic Name and Address Cleaning and Standardisation , 2002, AusDM.
[59] Luca De Santis,et al. Automatic Record Matching in Cooperative Information Systems , 2002 .
[60] Ben Taskar,et al. Discriminative Probabilistic Models for Relational Data , 2002, UAI.
[61] Vijay S. Iyengar,et al. Transforming data to satisfy privacy constraints , 2002, KDD.
[62] Ahmed K. Elmagarmid,et al. TAILOR: a record linkage toolbox , 2002, Proceedings 18th International Conference on Data Engineering.
[63] Simon D. Woodcock,et al. Disclosure Limitation in Longitudinal Linked Data , 2002 .
[64] Ramasamy Uthurusamy,et al. EVOLVING DATA MINING INTO SOLUTIONS FOR INSIGHTS , 2002 .
[65] Ben Taskar,et al. Learning Probabilistic Models of Link Structure , 2003, J. Mach. Learn. Res..
[66] Erhard Rahm,et al. COMA - A System for Flexible Combination of Schema Matching Approaches , 2002, VLDB.
[67] Surajit Chaudhuri,et al. Eliminating Fuzzy Duplicates in Data Warehouses , 2002, VLDB.
[68] William E. Winkler,et al. Disclosure Risk Assessment in Perturbative Microdata Protection , 2002, Inference Control in Statistical Databases.
[69] Ben Taskar,et al. Learning on the Test Data: Leveraging Unseen Features , 2003, ICML.
[70] Lars Vilhuber,et al. The Sensitivity of Economic Statistics to Coding Errors in Personal Identifiers , 2003 .
[71] L. Sweeney,et al. Trail Re-Identification: Learning Who You Are From Where You Have Been , 2003 .
[72] Ben Taskar,et al. Link Prediction in Relational Data , 2003, NIPS.
[73] William E. Winkler. Data Cleaning Methods , 2003 .
[74] Andrew McCallum,et al. Object Consolodation by Graph Partitioning with a Conditionally›Trained Distance Metric , 2003 .
[75] Hanan Samet,et al. Index-driven similarity search in metric spaces (Survey Article) , 2003, TODS.
[76] Peter Christen,et al. A Comparison of Fast Blocking Methods for Record Linkage , 2003, KDD 2003.
[77] Hiroshi Ishikawa,et al. Exact Optimization for Markov Random Fields with Convex Priors , 2003, IEEE Trans. Pattern Anal. Mach. Intell..
[78] Pradeep Ravikumar,et al. A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.
[79] Pradeep Ravikumar,et al. Adaptive Name Matching in Information Integration , 2003, IEEE Intell. Syst..
[80] Dale Schuurmans,et al. Learning Mixture Models with the Latent Maximum Entropy Principle , 2003, ICML.
[81] Raymond J. Mooney,et al. Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.
[82] Chen Li,et al. Efficient record linkage in large data sets , 2003, Eighth International Conference on Database Systems for Advanced Applications, 2003. (DASFAA 2003). Proceedings..
[83] Mikhail Bilenko and Raymond J. Mooney,et al. On Evaluation and Training-Set Construction for Duplicate Detection , 2003 .
[84] William E. Winkler,et al. Methods for evaluating and creating data quality , 2004, Inf. Syst..
[85] Jie Wei,et al. Markov Edit Distance , 2004, IEEE Trans. Pattern Anal. Mach. Intell..
[86] William W. Cohen,et al. Exploiting dictionaries in named entity extraction: combining semi-Markov extraction processes and data integration methods , 2004, KDD.
[87] Sudipto Guha,et al. Merging the Results of Approximate Match Operations , 2004, VLDB.
[88] P. Ivax,et al. A THEORY FOR RECORD LINKAGE , 2004 .
[89] Pradeep Ravikumar,et al. Variational Chernoff Bounds for Graphical Models , 2004, UAI.
[90] Divesh Srivastava,et al. Flexible String Matching Against Large Databases in Practice , 2004, VLDB.
[91] D. Ruppert. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .
[92] Bianca Zadrozny,et al. Learning and evaluating classifiers under sample selection bias , 2004, ICML.
[93] William E. Winkler,et al. Masking and Re-identification Methods for Public-Use Microdata: Overview and Research Problems , 2004, Privacy in Statistical Databases.
[94] William E. Yancey. An Adaptive String Comparator for Record Linkage , 2004 .
[95] Ulf Brefeld,et al. Co-EM support vector learning , 2004, ICML.
[96] Pradeep Ravikumar,et al. A Hierarchical Graphical Model for Record Linkage , 2004, UAI.
[97] Vicenç Torra,et al. OWA operators in data modeling and reidentification , 2004, IEEE Transactions on Fuzzy Systems.
[98] John M. Abowd,et al. Multiply-Imputing Confidential Characteristics and File Links in Longitudinal Linked Data , 2004, Privacy in Statistical Databases.
[99] William E. Winkler,et al. Re-identification Methods for Masked Microdata , 2004, Privacy in Statistical Databases.
[100] Renato Bruni. Discrete models for data imputation , 2004, Discret. Appl. Math..
[101] Tong Zhang,et al. Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.
[102] Eugene Agichtein,et al. Mining reference tables for automatic text segmentation , 2004, KDD.
[103] Michael D. Larsen,et al. Hierarchical Bayesian Record Linkage Theory , 2005 .
[104] P. Lahiri,et al. Regression Analysis With Linked Data , 2005 .
[105] W. Winkler. SERIES ( Statistics # 2005-02 ) Approximate String Comparator Search Strategies for Very Large Administrative Lists , 2005 .
[106] Andrew McCallum,et al. A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance , 2005, UAI.
[107] Rajeev Motwani,et al. Robust identification of fuzzy duplicates , 2005, 21st International Conference on Data Engineering (ICDE'05).
[108] Andrew McCallum,et al. Joint deduplication of multiple record types in relational data , 2005, CIKM '05.
[109] Philip S. Yu,et al. An improved categorization of classifier's sensitivity on sample selection bias , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).
[110] William E. Yancey. Evaluating String Comparator Performance for Record Linkage , 2005 .
[111] Jayant Madhavan,et al. Reference reconciliation in complex information spaces , 2005, SIGMOD '05.
[112] Renato Bruni,et al. Error correction for massive datasets , 2005, Optim. Methods Softw..
[113] Zoubin Ghahramani,et al. Proceedings of the 24th international conference on Machine learning , 2007, ICML 2007.
[114] Jennifer Widom,et al. Swoosh: a generic approach to entity resolution , 2008, The VLDB Journal.