Automating Large-Scale Data Quality Verification
暂无分享,去创建一个
Felix Bießmann | Dustin Lange | Sebastian Schelter | Philipp Schmidt | Meltem Celikel | Andreas Grafberger | Sebastian Schelter | F. Biessmann | Dustin Lange | Philipp Schmidt | Andreas Grafberger | Meltem Celikel
[1] Joseph M. Hellerstein,et al. Ground: A Data Context Service , 2017, CIDR.
[2] Wenfei Fan,et al. Conditional Functional Dependencies for Data Cleaning , 2007, 2007 IEEE 23rd International Conference on Data Engineering.
[3] Ameet Talwalkar,et al. MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..
[4] Peter Norvig,et al. The Unreasonable Effectiveness of Data , 2009, IEEE Intelligent Systems.
[5] Paolo Papotti,et al. Discovering Denial Constraints , 2013, Proc. VLDB Endow..
[6] Dennis Shasha,et al. Declarative Data Cleaning: Language, Model, and Algorithms , 2001, VLDB.
[7] Manasi Vartak,et al. ModelDB: a system for machine learning model management , 2016, HILDA '16.
[8] Joos-Hendrik Böse,et al. Probabilistic Demand Forecasting at Scale , 2017, Proc. VLDB Endow..
[9] D. Sculley,et al. The ML test score: A rubric for ML production readiness and technical debt reduction , 2017, 2017 IEEE International Conference on Big Data (Big Data).
[10] Sanjay Krishnan,et al. ActiveClean: An Interactive Data Cleaning Framework For Modern Machine Learning , 2016, SIGMOD Conference.
[11] Xin Zhang,et al. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform , 2017, KDD.
[12] Sebastian Schelter,et al. Automatically Tracking Metadata and Provenance of Machine Learning Experiments , 2017 .
[13] Benjamin Recht,et al. KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics , 2016, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).
[14] Sebastian Link,et al. Data Quality: The Role of Empiricism , 2018, SGMD.
[15] Felix Naumann,et al. A Hybrid Approach to Functional Dependency Discovery , 2016, SIGMOD Conference.
[16] Samridhi Jha. Data Infrastructure for Machine Learning , 2019 .
[17] D. Sculley,et al. The Data Linter: Lightweight Automated Sanity Checking for ML Data Sets , 2017 .
[18] Sanjeev Khanna,et al. Space-efficient online computation of quantile summaries , 2001, SIGMOD '01.
[19] Ahmed K. Elmagarmid,et al. Guided data repair , 2011, Proc. VLDB Endow..
[20] D. Sculley,et al. Hidden Technical Debt in Machine Learning Systems , 2015, NIPS.
[21] Felix Naumann,et al. Profiling relational data: a survey , 2015, The VLDB Journal.
[22] Joseph M. Hellerstein,et al. Quantitative Data Cleaning for Large Databases , 2008 .
[23] Valentin Flunkert,et al. DeepAR: Probabilistic Forecasting with Autoregressive Recurrent Networks , 2017, International Journal of Forecasting.
[24] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[25] Alexander Hall,et al. HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm , 2013, EDBT '13.
[26] Juliana Freire,et al. noWorkflow: a Tool for Collecting, Analyzing, and Managing Provenance from Python Scripts , 2017, Proc. VLDB Endow..
[27] Carlo Batini,et al. Methodologies for data quality assessment and improvement , 2009, CSUR.
[28] Amol Deshpande,et al. On Model Discovery For Hosted Data Science Projects , 2017, DEEM@SIGMOD.
[29] Sebastian Schelter,et al. Declarative Metadata Management : A Missing Piece in End-To-End Machine Learning , 2018 .
[30] Sanjay Krishnan,et al. BoostClean: Automated Error Detection and Repair for Machine Learning , 2017, ArXiv.
[31] Chen Sun,et al. Revisiting Unreasonable Effectiveness of Data in Deep Learning Era , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[32] Michal Zielinski,et al. Versioning for End-to-End Machine Learning Pipelines , 2017, DEEM@SIGMOD.
[33] Ihab F. Ilyas,et al. Trends in Cleaning Relational Data: Consistency and Deduplication , 2015, Found. Trends Databases.
[34] Larry S. Davis,et al. Towards Unified Data and Lifecycle Management for Deep Learning , 2016, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).
[35] Christopher Ré,et al. The HoloClean Framework Dataset to be cleaned Denial Constraints External Information t 1 t 4 t 2 t 3 Johnnyo ’ s , 2017 .
[36] Felix Naumann,et al. Quality-Driven Query Answering for Integrated Information Systems , 2002, Lecture Notes in Computer Science.
[37] Gaël Varoquaux,et al. Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..
[38] George Athanasopoulos,et al. Forecasting: principles and practice , 2013 .
[39] Cédric Archambeau,et al. An interpretable latent variable model for attribute applicability in the Amazon catalogue , 2017, ArXiv.
[40] Jeffrey F. Naughton,et al. Model Selection Management Systems: The Next Frontier of Advanced Analytics , 2016, SGMD.
[41] J. Manthorpe. Land Registration and Land Valuation in the United Kingdom and in the Countries of the United Nations Economic Commission for Europe (UNECE) , 1998 .
[42] Ihab F. Ilyas,et al. Data Cleaning: Overview and Emerging Challenges , 2016, SIGMOD Conference.
[43] Alon Y. Halevy,et al. Goods: Organizing Google's Datasets , 2016, SIGMOD Conference.
[44] Luís Torgo,et al. OpenML: networked science in machine learning , 2014, SKDD.
[45] Neoklis Polyzotis,et al. Data Management Challenges in Production Machine Learning , 2017, SIGMOD Conference.
[46] Theodore Johnson,et al. Mining database structure; or, how to build a data quality browser , 2002, SIGMOD '02.
[47] David R. Karger,et al. Tackling the Poor Assumptions of Naive Bayes Text Classifiers , 2003, ICML.
[48] KhannaSanjeev,et al. Space-efficient online computation of quantile summaries , 2001 .
[49] Joseph K. Bradley,et al. Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.
[50] Felix Naumann,et al. Cardinality Estimation: An Experimental Survey , 2017, Proc. VLDB Endow..