A Survey on Geographically Distributed Big-Data Processing Using MapReduce
暂无分享,去创建一个
Ehud Gudes | Shlomi Dolev | Shantanu Sharma | Patricia Florissi | Ido Singer | S. Dolev | E. Gudes | Shantanu Sharma | P. Florissi | Ido Singer | Patricia Florissi
[1] Paolo Papotti,et al. Road to Freedom in Big Data Analytics , 2016, EDBT.
[2] Minlan Yu,et al. Scheduling jobs across geo-distributed datacenters , 2015, SoCC.
[3] Navendu Jain,et al. An empirical analysis of intra- and inter-datacenter network failures for geo-distributed services , 2013, SIGMETRICS '13.
[4] Bin Cheng,et al. GeeLytics: Geo-distributed edge analytics for large scale IoT systems based on dynamic topology , 2015, 2015 IEEE 2nd World Forum on Internet of Things (WF-IoT).
[5] Ameet Talwalkar,et al. MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..
[6] Dan Suciu,et al. Parallel Skyline Queries , 2012, Theory of Computing Systems.
[7] Jure Leskovec,et al. Mining of Massive Datasets, 2nd Ed , 2014 .
[8] Murat Kantarcioglu,et al. SEMROD: Secure and Efficient MapReduce Over HybriD Clouds , 2015, SIGMOD Conference.
[9] Dick H. J. Epema,et al. KOALA: a co‐allocating grid scheduler , 2008, Concurr. Comput. Pract. Exp..
[10] Hans-Arno Jacobsen,et al. PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..
[11] Jeffrey D. Ullman,et al. Matching bounds for the all-pairs MapReduce problem , 2013, IDEAS '13.
[12] Minyi Guo,et al. Pricing and Repurchasing for Big Data Processing in Multi-Clouds , 2016, IEEE Transactions on Emerging Topics in Computing.
[13] Carlo Curino,et al. Towards Geo-Distributed Machine Learning , 2017, IEEE Data Eng. Bull..
[14] Zoe L. Jiang,et al. Key based data analytics across data centers considering bi-level resource provision in cloud computing , 2016, Future Gener. Comput. Syst..
[15] Carlo Curino,et al. WANalytics: Analytics for a Geo-Distributed Data-Intensive World , 2015, CIDR.
[16] Patrick Th. Eugster,et al. Efficient Geo-distributed Data Processing with Rout , 2013, 2013 IEEE 33rd International Conference on Distributed Computing Systems.
[17] Lars George,et al. HBase: The Definitive Guide , 2011 .
[18] Nikos Parlavantzas,et al. Resilin: Elastic MapReduce over Multiple Clouds , 2013, 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing.
[19] Tova Milo,et al. An Efficient MapReduce Cube Algorithm for Varied DataDistributions , 2016, SIGMOD Conference.
[20] Christof Fetzer,et al. EHadoop: Network I/O Aware Scheduler for Elastic MapReduce Cluster , 2015, 2015 IEEE 8th International Conference on Cloud Computing.
[21] Ehud Gudes,et al. Security and privacy aspects in MapReduce on clouds: A survey , 2016, Comput. Sci. Rev..
[22] Yuan Yuan,et al. Major technical advancements in apache hive , 2014, SIGMOD Conference.
[23] Song Guo,et al. Traffic-Aware Geo-Distributed Big Data Analytics with Predictable Job Completion Time , 2017, IEEE Transactions on Parallel and Distributed Systems.
[24] Jeffrey D. Ullman,et al. Assignment Problems of Different-Sized Inputs in MapReduce , 2015, ACM Trans. Knowl. Discov. Data.
[25] Joseph K. Bradley,et al. Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.
[26] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[27] Ian T. Foster,et al. Differentiated Scheduling of Response-Critical and Best-Effort Wide-Area Data Transfers , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[28] Chenyu Wang,et al. Exploring MapReduce efficiency with highly-distributed data , 2011, MapReduce '11.
[29] Yanfei Guo. Moving MapReduce into the cloud: Elasticity, efficiency and scalability , 2015 .
[30] Gabriel Antoniu,et al. SAGE: Geo-Distributed Streaming Data Analysis in Clouds , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[31] Patrick Th. Eugster,et al. From the Cloud to the Atmosphere: Running MapReduce across Data Centers , 2014, IEEE Transactions on Computers.
[32] Jeffrey D. Ullman,et al. Meta-MapReduce: A Technique for Reducing Communication in MapReduce Computations , 2015, ArXiv.
[33] Jack J. Dongarra,et al. Exascale computing and big data , 2015, Commun. ACM.
[34] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.
[35] Anshul Jaiswal,et al. Realtime Data Processing at Facebook , 2016, SIGMOD Conference.
[36] Song Guo,et al. Cost Minimization for Big Data Processing in Geo-Distributed Data Centers , 2014, IEEE Transactions on Emerging Topics in Computing.
[37] Ian Rae,et al. F1: A Distributed SQL Database That Scales , 2013, Proc. VLDB Endow..
[38] Thomas Heinis,et al. THERMAL-JOIN: A Scalable Spatial Join for Dynamic Workloads , 2015, SIGMOD Conference.
[39] Christopher Frost,et al. Spanner: Google's Globally-Distributed Database , 2012, OSDI.
[40] Ramesh K. Sitaraman,et al. Trading Timeliness and Accuracy in Geo-Distributed Streaming Analytics , 2016, SoCC.
[41] Patrick Wendell,et al. Sparrow: distributed, low latency scheduling , 2013, SOSP.
[42] Hui Ding,et al. TAO: Facebook's Distributed Data Store for the Social Graph , 2013, USENIX Annual Technical Conference.
[43] Carlo Curino,et al. Global Analytics in the Face of Bandwidth and Regulatory Constraints , 2015, NSDI.
[44] Miguel Correia,et al. Medusa: An Efficient Cloud Fault-Tolerant MapReduce , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).
[45] Beng Chin Ooi,et al. Efficient Processing of k Nearest Neighbor Joins using MapReduce , 2012, Proc. VLDB Endow..
[46] Manish Parashar,et al. A case for MapReduce over the internet , 2013, CAC.
[47] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[48] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.
[49] György Turán,et al. On the Computational Complexity of MapReduce , 2015, DISC.
[50] Ramesh K. Sitaraman,et al. Optimizing Grouped Aggregation in Geo-Distributed Streaming Analytics , 2015, HPDC.
[51] Kenneth A. Hawick,et al. Distributed frameworks and parallel algorithms for processing large-scale geographic data , 2003, Parallel Comput..
[52] Neoklis Polyzotis,et al. Iterative MapReduce for Large Scale Machine Learning , 2013, ArXiv.
[53] Xue-wen Chen,et al. Large-Scale Deep Belief Nets With MapReduce , 2014, IEEE Access.
[54] Michael J. Freedman,et al. Making Every Bit Count in Wide-Area Analytics , 2013, HotOS.
[55] Gurmeet Singh Manku,et al. Detecting near-duplicates for web crawling , 2007, WWW '07.
[56] Gautam Shroff,et al. Graph-Parallel Entity Resolution using LSH & IMM , 2014, EDBT/ICDT Workshops.
[57] Cong Yu,et al. Data Cube Materialization and Mining over MapReduce , 2012, IEEE Transactions on Knowledge and Data Engineering.
[58] Nicolas Bruno,et al. SCOPE: parallel databases meet MapReduce , 2012, The VLDB Journal.
[59] Dick H. J. Epema,et al. Resource Management for Dynamic MapReduce Clusters in Multicluster Systems , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[60] David A. Maltz,et al. Surviving failures in bandwidth-constrained datacenters , 2012, CCRV.
[61] Min Wang,et al. Efficient Multi-way Theta-Join Processing Using MapReduce , 2012, Proc. VLDB Endow..
[62] Divyakant Agrawal,et al. DB-Risk: The Game of Global Database Placement , 2016, SIGMOD Conference.
[63] Janak H. Patel,et al. Model of Computation , 1990 .
[64] Haifeng Jiang,et al. Photon: fault-tolerant and scalable joining of continuous data streams , 2013, SIGMOD '13.
[65] Sergei Vassilvitskii,et al. A model of computation for MapReduce , 2010, SODA '10.
[66] Yongli Zhu,et al. Cache conscious star-join in MapReduce environments , 2013, Cloud-I '13.
[67] Kamesh Munagala,et al. Complexity Measures for Map-Reduce, and Comparison to Parallel Computing , 2012, ArXiv.
[68] Osamu Tatebe,et al. Gfarm Grid File System , 2010, New Generation Computing.
[69] Sanjay Kumar Madria. Security and Risk Assessment in the Cloud , 2016, Computer.
[70] Kyungho Jeon,et al. The HybrEx Model for Confidentiality and Privacy in Cloud Computing , 2011, HotCloud.
[71] Jimmy J. Lin,et al. Book Reviews: Data-Intensive Text Processing with MapReduce by Jimmy Lin and Chris Dyer , 2010, CL.
[72] Himanshu Gupta,et al. ε-Controlled-Replicate: An ImprovedControlled-Replicate Algorithm for Multi-way Spatial Join Processing on Map-Reduce , 2014, WISE.
[73] Vinod Kumar Vavilapalli,et al. Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2 , 2014 .
[74] Jignesh M. Patel,et al. Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.
[75] Qi Zhang,et al. Improving Hadoop Service Provisioning in a Geographically Distributed Cloud , 2014, 2014 IEEE 7th International Conference on Cloud Computing.
[76] L. Venkata Subramaniam,et al. Processing Interval Joins On Map-Reduce , 2014, EDBT.
[77] Gilles Fedak,et al. HybridMR: a new approach for hybrid MapReduce combining desktop grid and cloud infrastructures , 2015, Concurr. Comput. Pract. Exp..
[78] Divyakant Agrawal,et al. The Challenges of Global-Scale Data Management , 2016, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).
[79] Vivek Kundra,et al. Federal Cloud Computing Strategy , 2011 .
[80] Joaquim Sousa Pinto,et al. Sky computing , 2011, 6th Iberian Conference on Information Systems and Technologies (CISTI 2011).
[81] Manish Parashar,et al. Investigating MapReduce framework extensions for efficient processing of geographically scattered datasets , 2011, PERV.
[82] Jimmy J. Lin,et al. Summingbird: A Framework for Integrating Batch and Online MapReduce Computations , 2014, Proc. VLDB Endow..
[83] Chenyu Wang,et al. Cross-Phase Optimization in MapReduce , 2013, 2013 IEEE International Conference on Cloud Engineering (IC2E).
[84] Rajiv Ranjan,et al. G-Hadoop: MapReduce across distributed data centers for data-intensive computing , 2013, Future Gener. Comput. Syst..
[85] Ashish Gupta,et al. High-Availability at Massive Scale: Building Google's Data Infrastructure for Ads , 2015, BIRTE.
[86] Matei Zaharia,et al. Matrix Computations and Optimization in Apache Spark , 2015, KDD.
[87] Reynold Xin,et al. GraphX: a resilient distributed graph system on Spark , 2013, GRADES.
[88] Abhishek Chandra,et al. Nebula: Distributed Edge Cloud for Data Intensive Computing , 2014, 2014 IEEE International Conference on Cloud Engineering.
[89] Ramesh K. Sitaraman,et al. End-to-End Optimization for Geo-Distributed MapReduce , 2016, IEEE Transactions on Cloud Computing.
[90] XiaoFeng Wang,et al. Sedic: privacy-aware data intensive computing on hybrid clouds , 2011, CCS '11.
[91] Paramvir Bahl,et al. Low Latency Geo-distributed Data Analytics , 2015, SIGCOMM.
[92] Abhishek Chandra,et al. Redefining Data Locality for Cross-Data Center Storage , 2015, BigSystem@HPDC.
[93] Dick H. J. Epema,et al. Dynamically Scheduling a Component-Based Framework in Clusters , 2014, JSSPP.
[94] Shayan Saeed. Sandooq: improving the communication cost and service latency for a multi-user erasure-coded geo-distributed cloud environment , 2016 .
[95] Joseph M. Hellerstein,et al. Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..
[96] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).
[97] Harumi A. Kuno,et al. The mixed workload CH-benCHmark , 2011, DBTest '11.
[98] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[99] Onur Mutlu,et al. Gaia: Geo-Distributed Machine Learning Approaching LAN Speeds , 2017, NSDI.
[100] Huajun Chen,et al. MapReduce-Based Pattern Finding Algorithm Applied in Motif Detection for Prescription Compatibility Network , 2009, APPT.
[101] Chen Li,et al. Efficient parallel set-similarity joins using MapReduce , 2010, SIGMOD Conference.
[102] Roland H. C. Yap,et al. Tagged-MapReduce: A General Framework for Secure Computing with Mixed-Sensitivity Data on Hybrid Clouds , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[103] Michael Stonebraker,et al. The BigDAWG polystore system and architecture , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).
[104] Fan Yang,et al. Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing , 2014, Proc. VLDB Endow..
[105] Nicolas Bruno,et al. Spanner: Becoming a SQL System , 2017, SIGMOD Conference.
[106] Abhishek Chandra,et al. Awan: Locality-Aware Resource Manager for Geo-Distributed Data-Intensive Applications , 2016, 2016 IEEE International Conference on Cloud Engineering (IC2E).
[107] Divyakant Agrawal,et al. Multi-representation Based Data Processing Architecture for IoT Applications , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).
[108] Scott Shenker,et al. Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.
[109] Pete Wyckoff,et al. Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..
[110] Margarida Mamede,et al. PIXIDA: Optimizing Data Parallel Jobs in Wide-Area Data Analytics , 2015, Proc. VLDB Endow..
[111] Giuseppe Di Modica,et al. H2F: A Hierarchical Hadoop Framework for Big Data Processing in Geo-Distributed Environments , 2016, 2016 IEEE/ACM 3rd International Conference on Big Data Computing Applications and Technologies (BDCAT).
[112] Silvio Lattanzi,et al. Filtering: a method for solving graph problems in MapReduce , 2011, SPAA '11.
[113] Feifei Li,et al. Efficient parallel kNN joins for large data in MapReduce , 2012, EDBT '12.
[114] Jeffrey D. Ullman,et al. Enumerating subgraph instances using map-reduce , 2012, 2013 IEEE 29th International Conference on Data Engineering (ICDE).
[115] Jun Luo,et al. Flutter: Scheduling tasks closer to data across geo-distributed datacenters , 2016, IEEE INFOCOM 2016 - The 35th Annual IEEE International Conference on Computer Communications.
[116] Chen He,et al. HOG: Distributed Hadoop MapReduce on the Grid , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.
[117] Thomas F. Wenisch,et al. Minimizing Remote Accesses in MapReduce Clusters , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.
[118] Bingsheng He,et al. On Achieving Efficient Data Transfer for Graph Processing in Geo-Distributed Datacenters , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).
[119] Jeffrey D. Ullman,et al. Upper and Lower Bounds on the Cost of a Map-Reduce Computation , 2012, Proc. VLDB Endow..
[120] Michael T. Goodrich,et al. Simulating Parallel Algorithms in the MapReduce Framework with Applications to Parallel Computational Geometry , 2010, ArXiv.
[121] Murat Kantarcioglu,et al. Secure and Efficient Query Processing over Hybrid Clouds , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).
[122] Aditya G. Parameswaran,et al. Fuzzy Joins Using MapReduce , 2012, 2012 IEEE 28th International Conference on Data Engineering.
[123] Jeffrey D. Ullman,et al. Bounds for Overlapping Interval Join on MapReduce , 2015, EDBT/ICDT Workshops.
[124] Ravi Kumar,et al. Pig latin: a not-so-foreign language for data processing , 2008, SIGMOD Conference.
[125] Eli Upfal,et al. Space-round tradeoffs for MapReduce computations , 2011, ICS '12.
[126] Rui Wang,et al. Bridging Data in the Clouds: An Environment-Aware System for Geographically Distributed Data Transfers , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[127] Giuseppe Di Modica,et al. Application profiling in hierarchical Hadoop for geo-distributed computing environments , 2016, 2016 IEEE Symposium on Computers and Communication (ISCC).
[128] Michael J. Franklin,et al. GridDB: A Database Interface to the Grid. , 2003, SIGMOD 2003.
[129] Alexandru Iosup,et al. Balanced resource allocations across multiple dynamic MapReduce clusters , 2014, SIGMETRICS '14.
[130] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[131] Jorge Luis Rodriguez,et al. The Open Science Grid , 2005 .
[132] Zhenni Li,et al. Tology-Aware Optimal Data Placement Algorithm for Network Traffic Optimization , 2016, IEEE Transactions on Computers.
[133] Shanika Karunasekera,et al. Distributed stream clustering using micro-clusters on Apache Storm , 2017, J. Parallel Distributed Comput..
[134] Michael J. Freedman,et al. Aggregation and Degradation in JetStream: Streaming Analytics in the Wide Area , 2014, NSDI.
[135] Patrick Wendell,et al. Learning Spark: Lightning-Fast Big Data Analytics , 2015 .
[136] Kyle Banker,et al. MongoDB in Action , 2011 .
[137] Athanasios V. Vasilakos,et al. Multimedia Applications and Security in MapReduce: Opportunities and Challenges , 2012, Concurr. Comput. Pract. Exp..
[138] Alec Wolman,et al. Volley: Automated Data Placement for Geo-Distributed Cloud Services , 2010, NSDI.
[139] Yuan Luo,et al. Hierarchical MapReduce Programming Model and Scheduling Algorithms , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).
[140] Jeffrey D. Ullman. Designing good MapReduce algorithms , 2012, XRDS.
[141] Jeffrey D. Ullman,et al. Vision Paper: Towards an Understanding of the Limits of Map-Reduce Computation , 2012, ArXiv.
[142] Mirek Riedewald,et al. Processing theta-joins using MapReduce , 2011, SIGMOD '11.