SampleClean: Fast and Reliable Analytics on Dirty Data
暂无分享,去创建一个
Tim Kraska | Tova Milo | Sanjay Krishnan | Kenneth Y. Goldberg | Eugene Wu | Michael J. Franklin | Jiannan Wang | S. Krishnan | Ken Goldberg | M. Franklin | Tim Kraska | Eugene Wu | T. Milo | Jiannan Wang | Tova Milo
[1] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.
[2] Sanjay Krishnan,et al. Wisteria: Nurturing Scalable Data Cleaning Infrastructure , 2015, Proc. VLDB Endow..
[3] Tim Kraska,et al. Stale View Cleaning: Getting Fresh Answers from Stale Materialized Views , 2015, Proc. VLDB Endow..
[4] Tim Kraska,et al. Tupleware: Distributed Machine Learning on Small Clusters , 2014, IEEE Data Eng. Bull..
[5] Sridhar Ramaswamy,et al. The Aqua approximate query answering system , 1999, SIGMOD '99.
[6] Joseph M. Hellerstein,et al. Online aggregation and continuous query support in MapReduce , 2010, SIGMOD Conference.
[7] F. Olken,et al. Maintenance of materialized views of sampling queries , 1992, [1992] Eighth International Conference on Data Engineering.
[8] Jayati. The Berkeley Data Analytics Stack (BDAS) , 2014, 2014 Conference on IT in Business, Industry and Government (CSIBIG).
[9] Tim Kraska,et al. CrowdER: Crowdsourcing Entity Resolution , 2012, Proc. VLDB Endow..
[10] J. Manthorpe. Land Registration and Land Valuation in the United Kingdom and in the Countries of the United Nations Economic Commission for Europe (UNECE) , 1998 .
[11] Ahmed K. Elmagarmid,et al. Guided data repair , 2011, Proc. VLDB Endow..
[12] Surajit Chaudhuri,et al. Optimized stratified sampling for approximate query processing , 2007, TODS.
[13] Felix Naumann,et al. The Stratosphere platform for big data analytics , 2014, The VLDB Journal.
[14] Raymond J. Mooney,et al. Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.
[15] E. R. Cohen. An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements , 1998 .
[16] Wenfei Fan,et al. Foundations of Data Quality Management , 2012, Foundations of Data Quality Management.
[17] Martin L. Kersten,et al. SciBORQ: Scientific data management with Bounds On Runtime and Quality , 2011, CIDR.
[18] Helen J. Wang,et al. Online aggregation , 1997, SIGMOD '97.
[19] Tim Kraska,et al. A sample-and-clean framework for fast and accurate query processing on dirty data , 2014, SIGMOD Conference.
[20] Jennifer Widom,et al. CrowdFill: collecting structured data from the crowd , 2014, SIGMOD Conference.
[21] Divesh Srivastava,et al. Big data integration , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).
[22] Jun S. Liu,et al. Metropolized independent sampling with comparisons to rejection sampling and importance sampling , 1996, Stat. Comput..
[23] Peter Christen,et al. Febrl: a freely available record linkage system with a graphical user interface , 2008 .
[24] Ahmed Eldawy,et al. NADEEF: a commodity data cleaning system , 2013, SIGMOD '13.
[25] Sanjay Krishnan,et al. ActiveClean: Interactive Data Cleaning While Learning Convex Loss Models , 2016, ArXiv.
[26] Frank Olken,et al. Random Sampling from Databases , 1993 .
[27] Kun Li,et al. The MADlib Analytics Library or MAD Skills, the SQL , 2012, Proc. VLDB Endow..
[28] Theodore Johnson,et al. Exploratory Data Mining and Data Cleaning , 2003 .
[29] Doron Rotem,et al. Simple Random Sampling from Relational Databases , 1986, VLDB.
[30] Beng Chin Ooi,et al. Distributed Online Aggregation , 2009, Proc. VLDB Endow..
[31] Thomas Oberlechner. Psychology of Judgment and Decision-Making , 2006 .
[32] Joseph M. Hellerstein,et al. Quantitative Data Cleaning for Large Databases , 2008 .
[33] Samuel Madden,et al. Scorpion: Explaining Away Outliers in Aggregate Queries , 2013, Proc. VLDB Endow..
[34] Surajit Chaudhuri,et al. Dynamic sample selection for approximate query processing , 2003, SIGMOD '03.
[35] Suman Nath,et al. Tracing data errors with view-conditioned causality , 2011, SIGMOD '11.
[36] Ion Stoica,et al. BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.
[37] E. H. Simpson,et al. The Interpretation of Interaction in Contingency Tables , 1951 .
[38] Alon Y. Halevy,et al. Pay-as-you-go user feedback for dataspace systems , 2008, SIGMOD Conference.
[39] Chris Jermaine,et al. Online aggregation for large MapReduce jobs , 2011, Proc. VLDB Endow..
[40] Paolo Papotti,et al. Descriptive and prescriptive data cleaning , 2014, SIGMOD Conference.
[41] Jianzhong Li,et al. Towards certain fixes with editing rules and master data , 2010, The VLDB Journal.
[42] Minos N. Garofalakis,et al. Approximate Query Processing: Taming the TeraBytes , 2001, VLDB.
[43] Jeffrey F. Naughton,et al. Corleone: hands-off crowdsourcing for entity matching , 2014, SIGMOD Conference.
[44] Paolo Papotti,et al. KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing , 2015, SIGMOD Conference.