Auto-Validate: Unsupervised Data Validation Using Data-Domain Patterns Inferred from Data Lakes
暂无分享,去创建一个
[1] Nicolas Bruno,et al. SCOPE: parallel databases meet MapReduce , 2012, The VLDB Journal.
[2] Sriram Vasudevan,et al. Data Sentinel: A Declarative Production-Scale Data Validation Platform , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).
[3] Robert Gruber,et al. PADS: a domain-specific language for processing ad hoc data , 2005, PLDI '05.
[4] Jingren Zhou,et al. SCOPE: easy and efficient parallel processing of massive data sets , 2008, Proc. VLDB Endow..
[5] Erhard Rahm,et al. A survey of approaches to automatic schema matching , 2001, The VLDB Journal.
[6] Erhard Rahm,et al. Data Cleaning: Problems and Current Approaches , 2000, IEEE Data Eng. Bull..
[7] William W. Cohen,et al. Language-Independent Set Expansion of Named Entities Using the Web , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).
[8] Paolo Papotti,et al. Discovering Denial Constraints , 2013, Proc. VLDB Endow..
[9] Raul Castro Fernandez,et al. Extracting Syntactical Patterns from Databases , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).
[10] Paul Brown,et al. CORDS: automatic discovery of correlations and soft functional dependencies , 2004, SIGMOD '04.
[11] Richard M. Karp,et al. A fast parallel algorithm for the maximal independent set problem , 1985, JACM.
[12] Patricia S. O Sullivan,et al. 100 Statistical Tests , 1995 .
[13] Alekh Jindal,et al. Big Data Processing at Microsoft: Hyper Scale, Massive Complexity, and Minimal Cost , 2019, SoCC.
[14] Heikki Mannila,et al. Approximate Inference of Functional Dependencies from Relations , 1995, Theor. Comput. Sci..
[15] Sebastian Schelter,et al. Differential Data Quality Verification on Partitioned Data , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).
[16] Joseph M. Hellerstein,et al. Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.
[17] Felix Bießmann,et al. Automating Large-Scale Data Quality Verification , 2018, Proc. VLDB Endow..
[18] Reynold Cheng,et al. SCODED: Statistical Constraint Oriented Data Error Detection , 2020, SIGMOD Conference.
[19] Felix Naumann,et al. Data profiling revisited , 2014, SGMD.
[20] Felix Bießmann,et al. Unit Testing Data with Deequ , 2019, SIGMOD Conference.
[21] Winfried Just,et al. Computational Complexity of Multiple Sequence Alignment with SP-Score , 2001, J. Comput. Biol..
[22] D. Sculley,et al. The Data Linter: Lightweight Automated Sanity Checking for ML Data Sets , 2017 .
[23] Theodoros Rekatsinas,et al. HoloDetect: Few-Shot Learning for Error Detection , 2019, SIGMOD Conference.
[24] Yeye He,et al. Uni-Detect: A Unified Approach to Automated Error Detection in Tables , 2019, SIGMOD Conference.
[25] Michael Stonebraker,et al. Raha: A Configuration-Free Error Detection System , 2019, SIGMOD Conference.
[26] Eric Crestan,et al. Web-Scale Distributional Similarity and Entity Set Expansion , 2009, EMNLP.
[27] Renée J. Miller,et al. Discovering data quality rules , 2008, Proc. VLDB Endow..
[28] W. Tan,et al. Sato , 2019, Proc. VLDB Endow..
[29] Yeye He,et al. Auto-Detect: Data-Driven Error Detection in Tables , 2018, SIGMOD Conference.
[30] Neoklis Polyzotis,et al. Data Management Challenges in Production Machine Learning , 2017, SIGMOD Conference.
[31] Michael Stonebraker,et al. Data Integration: The Current Status and the Way Forward , 2018, IEEE Data Eng. Bull..
[32] A. Agresti. [A Survey of Exact Inference for Contingency Tables]: Rejoinder , 1992 .
[33] Tim Kraska,et al. Sherlock: A Deep Learning Approach to Semantic Data Type Detection , 2019, KDD.
[34] Xin-She Yang,et al. Introduction to Algorithms , 2021, Nature-Inspired Optimization Algorithms.
[35] Panos Vassiliadis,et al. Near Real Time ETL , 2009, New Trends in Data Warehousing and Data Analysis.
[36] Norman W. Paton,et al. Dataset Discovery in Data Lakes , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).
[37] Zifan Liu,et al. Picket: Self-supervised Data Diagnostics for ML Pipelines , 2020, ArXiv.
[38] D. Lipman,et al. The multiple sequence alignment problem in biology , 1988 .
[39] Felix Naumann,et al. A Hybrid Approach to Functional Dependency Discovery , 2016, SIGMOD Conference.
[40] Sumit Gulwani,et al. FlashProfile: a framework for synthesizing data profiles , 2017, Proc. ACM Program. Lang..
[41] Kevin Wilkinson,et al. Data integration flows for business intelligence , 2009, EDBT '09.
[42] Richard M. Karp,et al. A fast parallel algorithm for the maximal independent set problem , 1984, STOC '84.
[43] Neoklis Polyzotis,et al. Data Validation for Machine Learning , 2019, SysML.
[44] Cong Yan,et al. Synthesizing Type-Detection Logic for Rich Semantic Data Types using Open-source Code , 2018, SIGMOD Conference.
[45] David Walker,et al. From dirt to shovels: fully automatic tool generation from ad hoc data , 2008, POPL '08.
[46] Michael Stonebraker,et al. ANMAT: Automatic Knowledge Discovery and Error Detection through Pattern Functional Dependencies , 2019, SIGMOD Conference.
[47] Theodore Johnson,et al. Mining database structure; or, how to build a data quality browser , 2002, SIGMOD '02.
[48] Yeye He,et al. ClusterJoin: A Similarity Joins Framework using Map-Reduce , 2014, Proc. VLDB Endow..
[49] Felix Naumann,et al. Discovery of Genuine Functional Dependencies from Relational Data with Missing Values , 2018, Proc. VLDB Endow..
[50] NaumannFelix,et al. Discovery of genuine functional dependencies from relational data with missing values , 2018, VLDB 2018.
[51] Gang Chen,et al. Metric Similarity Joins Using MapReduce , 2017, IEEE Transactions on Knowledge and Data Engineering.
[52] Yeye He,et al. SEISA: set expansion by iterative similarity aggregation , 2011, WWW.