Random Sample Partition: A Distributed Data Model for Big Data Analysis
暂无分享,去创建一个
[1] Taghi M. Khoshgoftaar,et al. A survey of open source tools for machine learning with big data in the Hadoop ecosystem , 2015, Journal of Big Data.
[2] Reynold Xin,et al. Apache Spark , 2016 .
[3] Surajit Chaudhuri,et al. Effective use of block-level sampling in statistics estimation , 2004, SIGMOD '04.
[4] Yang Wang,et al. Distributed and parallel construction method for equi-width histogram in cloud database , 2017, Multiagent Grid Syst..
[5] Junfeng Yang,et al. Optimizing Data Partitioning for Data-Parallel Computing , 2011, HotOS.
[6] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[7] Yu-Lin He,et al. A Two-Stage Data Processing Algorithm to Generate Random Sample Partitions for Big Data Analysis , 2018, CLOUD.
[8] Lior Rokach,et al. Ensemble-based classifiers , 2010, Artificial Intelligence Review.
[9] Ion Stoica,et al. BlinkDB: queries with bounded errors and bounded response times on very large data , 2012, EuroSys '13.
[10] L. Ryan,et al. Sufficiency Revisited: Rethinking Statistical Algorithms in the Big Data Era , 2017 .
[11] Vladimir Vlassov,et al. Block Sampling: Efficient Accurate Online Aggregation in MapReduce , 2013, 2013 IEEE 5th International Conference on Cloud Computing Technology and Science.
[12] Purnamrita Sarkar,et al. A scalable bootstrap for massive data , 2011, 1112.5016.
[13] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).
[14] Shraddha Phansalkar,et al. Survey of data partitioning algorithms for big data stores , 2016, 2016 Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC).
[15] Sparsh Mittal,et al. A Survey of Techniques for Approximate Computing , 2016, ACM Comput. Surv..
[16] Julian J. Faraway,et al. When small data beats big data , 2018 .
[17] Thu D. Nguyen,et al. ApproxHadoop: Bringing Approximations to MapReduce Frameworks , 2015, ASPLOS.
[18] Peter J. Haas,et al. Sampling for Scalable Visual Analytics , 2017, IEEE Computer Graphics and Applications.
[19] Ameet Talwalkar,et al. Knowing when you're wrong: building fast and reliable approximate query processing systems , 2014, SIGMOD Conference.
[20] Srikanth Kandula,et al. Quickr: Lazily Approximating Complex AdHoc Queries in BigData Clusters , 2016, SIGMOD Conference.
[21] Purnamrita Sarkar,et al. The Big Data Bootstrap , 2012, ICML.
[22] R. Tibshirani,et al. An introduction to the bootstrap , 1993 .
[23] Fei Xu,et al. Sampling Based Range Partition Methods for Big Data Analytics , 2012 .
[24] Tim Kraska,et al. A sample-and-clean framework for fast and accurate query processing on dirty data , 2014, SIGMOD Conference.
[25] Bowei Xi,et al. Large complex data: divide and recombine (D&R) with RHIPE , 2012 .
[26] Bernhard Schölkopf,et al. A Kernel Two-Sample Test , 2012, J. Mach. Learn. Res..
[27] Christof Fetzer,et al. IncApprox: A Data Analytics System for Incremental Approximate Computing , 2016, WWW.
[28] Yu-Lin He,et al. Empirical Analysis of Asymptotic Ensemble Learning for Big Data , 2016, 2016 IEEE/ACM 3rd International Conference on Big Data Computing Applications and Technologies (BDCAT).
[29] Ravi Nair,et al. Big data needs approximate computing , 2014, Commun. ACM.
[30] Nicole A. Lazar. The Big Picture: Divide and Combine to Conquer Big Data , 2018 .
[31] Xiaofeng Meng,et al. An Efficient Block Sampling Strategy for Online Aggregation in the Cloud , 2015, WAIM.
[32] Joshua Zhexue Huang,et al. Big data analytics on Apache Spark , 2016, International Journal of Data Science and Analytics.
[33] Charu C. Aggarwal,et al. Data Mining: The Textbook , 2015 .