Data and container placement in scalable data analytics platforms
暂无分享,去创建一个
[1] María S. Pérez-Hernández,et al. Spark Versus Flink: Understanding Performance in Big Data Analytics Frameworks , 2016, 2016 IEEE International Conference on Cluster Computing (CLUSTER).
[2] Keith Kirkpatrick,et al. Software-defined networking , 2013, CACM.
[3] Jignesh M. Patel,et al. Twitter Heron: Stream Processing at Scale , 2015, SIGMOD Conference.
[4] Jun Wang,et al. Improving metadata management for small files in HDFS , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.
[5] Carlo Curino,et al. Reservation-based Scheduling: If You're Late Don't Blame Us! , 2014, SoCC.
[6] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.
[7] Scott Shenker,et al. Shark: SQL and rich analytics at scale , 2012, SIGMOD '13.
[8] Astrid Rheinländer,et al. Opening the Black Boxes in Data Flow Optimization , 2012, Proc. VLDB Endow..
[9] Odej Kao,et al. CoLoc: Distributed data and container colocation for data-intensive applications , 2016, 2016 IEEE International Conference on Big Data (Big Data).
[10] Yun Tian,et al. Improving MapReduce performance through data placement in heterogeneous Hadoop clusters , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[11] Abhishek Verma,et al. Large-scale cluster management at Google with Borg , 2015, EuroSys.
[12] Pete Wyckoff,et al. Hive - A Warehousing Solution Over a Map-Reduce Framework , 2009, Proc. VLDB Endow..
[13] Reynold Xin,et al. GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.
[14] Christina Delimitrou,et al. Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.
[15] Ralph C. Merkle,et al. Protocols for Public Key Cryptosystems , 1980, 1980 IEEE Symposium on Security and Privacy.
[16] Benjamin Hindman,et al. Dominant Resource Fairness: Fair Allocation of Multiple Resource Types , 2011, NSDI.
[17] Yi Lu,et al. AdaptDB: Adaptive Partitioning for Distributed Joins , 2017, Proc. VLDB Endow..
[18] Xubin He,et al. Implementing WebGIS on Hadoop: A case study of improving small file I/O performance on HDFS , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.
[19] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[20] Felix Naumann,et al. Meteor/Sopremo: An Extensible Query Language and Operator Model , 2012 .
[21] Jaehwan Lee,et al. Introducing SSDs to the Hadoop MapReduce Framework , 2014, 2014 IEEE 7th International Conference on Cloud Computing.
[22] Dominic Battré,et al. Nephele/PACTs: a programming model and execution framework for web-scale analytical processing , 2010, SoCC '10.
[23] Ananth Grama,et al. UBIS: Utilization-Aware Cluster Scheduling , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[24] Odej Kao,et al. Endolith: A Blockchain-Based Framework to Enhance Data Retention in Cloud Storages , 2018, 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).
[25] Bin Cheng,et al. Building a Big Data Platform for Smart Cities: Experience and Lessons from Santander , 2015, 2015 IEEE International Congress on Big Data.
[26] Dick H. J. Epema,et al. KOALA-F: A Resource Manager for Scheduling Frameworks in Clusters , 2016, 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid).
[27] Charles E. Leiserson,et al. Fat-trees: Universal networks for hardware-efficient supercomputing , 1985, IEEE Transactions on Computers.
[28] Alex Davies,et al. Scale out with GlusterFS , 2013 .
[29] Odej Kao,et al. Scheduling Recurring Distributed Dataflow Jobs Based on Resource Utilization and Interference , 2017, 2017 IEEE International Congress on Big Data (BigData Congress).
[30] Luke M. Leslie,et al. Cross-Layer Scheduling in Cloud Systems , 2015, 2015 IEEE International Conference on Cloud Engineering.
[31] Massimo Bartoletti,et al. A Survey of Attacks on Ethereum Smart Contracts (SoK) , 2017, POST.
[32] Yi Lu,et al. Amoeba: A Shape changing Storage System for Big Data , 2016, Proc. VLDB Endow..
[33] Satoshi Nakamoto. Bitcoin : A Peer-to-Peer Electronic Cash System , 2009 .
[34] Albert G. Greenberg,et al. Scarlett: coping with skewed content popularity in mapreduce clusters , 2011, EuroSys '11.
[35] Michael D. Ernst,et al. HaLoop , 2010, Proc. VLDB Endow..
[36] Sungyoung Lee,et al. Adaptive Replication Management in HDFS Based on Supervised Learning , 2016, IEEE Transactions on Knowledge and Data Engineering.
[37] Yongfeng Huang,et al. Hmfs: Efficient Support of Small Files Processing over HDFS , 2014, ICA3PP.
[38] Vinay Setty,et al. Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing) , 2010, Proc. VLDB Endow..
[39] Srikanth Kandula,et al. Jockey: guaranteed job latency in data parallel clusters , 2012, EuroSys '12.
[40] Yuanyuan Tian,et al. CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop , 2011, Proc. VLDB Endow..
[41] Abhinandan Das,et al. Google news personalization: scalable online collaborative filtering , 2007, WWW '07.
[42] Marios Hadjieleftheriou,et al. Distributed data placement to minimize communication costs via graph partitioning , 2014, SSDBM '14.
[43] Peter R. Pietzuch,et al. Medea: scheduling of long running applications in shared production clusters , 2018, EuroSys.
[44] Goutam Paul,et al. Exploiting Block-Chain Data Structure for Auditorless Auditing on Cloud Data , 2016, ICISS.
[45] M. Abadi,et al. Naiad: a timely dataflow system , 2013, SOSP.
[46] Gilad Mishne,et al. Fast data in the era of big data: Twitter's real-time related query suggestion architecture , 2012, SIGMOD '13.
[47] Hitesh Ballani,et al. Towards predictable datacenter networks , 2011, SIGCOMM 2011.
[48] Malte Schwarzkopf. Cluster Scheduling for Data Centers , 2017, ACM Queue.
[49] Wei Lin,et al. Apollo: Scalable and Coordinated Scheduling for Cloud-Scale Computing , 2014, OSDI.
[50] Odej Kao,et al. Nephele: efficient parallel data processing in the cloud , 2009, MTAGS '09.
[51] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .
[52] Michael Abd-El-Malek,et al. Omega: flexible, scalable schedulers for large compute clusters , 2013, EuroSys '13.
[53] Maged M. Michael,et al. Scale-up x Scale-out: A Case Study using Nutch/Lucene , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[54] Huan Liu,et al. GridBatch: Cloud Computing for Large-Scale Data-Intensive Batch Applications , 2008, 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID).
[55] Ning Zhang,et al. ERMS: An Elastic Replication Management System for HDFS , 2012, 2012 IEEE International Conference on Cluster Computing Workshops.
[56] Muneeb Ali,et al. Blockstack: A Global Naming and Storage System Secured by Blockchains , 2016, USENIX Annual Technical Conference.
[57] Andrew V. Goldberg,et al. Quincy: fair scheduling for distributed computing clusters , 2009, SOSP '09.
[58] Bin Xu,et al. Proactive Data Placement for Surveillance Video Processing in Heterogeneous Cluster , 2016, 2016 IEEE International Conference on Cloud Computing Technology and Science (CloudCom).
[59] Felix Naumann,et al. SOFA: An extensible logical optimizer for UDF-heavy data flows , 2015, Inf. Syst..
[60] Odej Kao,et al. Continuously Improving the Resource Utilization of Iterative Parallel Dataflows , 2016, 2016 IEEE 36th International Conference on Distributed Computing Systems Workshops (ICDCSW).
[61] Scott Shenker,et al. Making Sense of Performance in Data Analytics Frameworks , 2015, NSDI.
[62] Ion Stoica,et al. The Power of Choice in Data-Aware Cluster Scheduling , 2014, OSDI.
[63] Odej Kao,et al. SMiPE: Estimating the Progress of Recurring Iterative Distributed Dataflows , 2017, 2017 18th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT).
[64] Odej Kao,et al. Addressing Hadoop's Small File Problem With an Appendable Archive File Format , 2017, Conf. Computing Frontiers.
[65] Geoffrey C. Fox,et al. Investigation of Data Locality in MapReduce , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).
[66] Scott Shenker,et al. Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling , 2010, EuroSys '10.
[67] Carlos Maltzahn,et al. Ceph: a scalable, high-performance distributed file system , 2006, OSDI '06.
[68] Cheng-Zhong Xu,et al. Interference and locality-aware task scheduling for MapReduce applications in virtual clusters , 2013, HPDC.
[69] Reynold Xin,et al. GraphX: a resilient distributed graph system on Spark , 2013, GRADES.
[70] Bin Liu,et al. EthDrive: A Peer-to-Peer Data Storage with Provenance , 2017, CAiSE-Forum-DC.
[71] Kostas Katrinis,et al. Pythia: Faster Big Data in Motion through Predictive Software-Defined Network Optimization at Runtime , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[72] Sachin Shetty,et al. ProvChain: A Blockchain-Based Data Provenance Architecture in Cloud Environment with Enhanced Privacy and Availability , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).
[73] GhemawatSanjay,et al. The Google file system , 2003 .
[74] Robert N. M. Watson,et al. Firmament: Fast, Centralized Cluster Scheduling at Scale , 2016, OSDI.
[75] Roy H. Campbell,et al. ARIA: automatic resource inference and allocation for mapreduce environments , 2011, ICAC '11.
[76] Cristina L. Abad,et al. DARE: Adaptive Data Replication for Efficient Cluster Scheduling , 2011, 2011 IEEE International Conference on Cluster Computing.
[77] Christoforos E. Kozyrakis,et al. Heracles: Improving resource efficiency at scale , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[78] Volker Markl,et al. Spinning Fast Iterative Data Flows , 2012, Proc. VLDB Endow..
[79] Hosung Park,et al. What is Twitter, a social network or a news media? , 2010, WWW '10.
[80] Ali Raza Butt,et al. VENU: Orchestrating SSDs in hadoop storage , 2014, 2014 IEEE International Conference on Big Data (Big Data).
[81] Brian Lee,et al. Towards Secure Provenance in the Cloud: A Survey , 2015, 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC).
[82] Matei Zaharia,et al. Job Scheduling for Multi-User MapReduce Clusters , 2009 .
[83] Yanpei Chen,et al. The Truth About MapReduce Performance on SSDs , 2014, LISA.
[84] Ching-Hsien Hsu,et al. Locality and loading aware virtual machine mapping techniques for optimizing communications in MapReduce applications , 2015, Future Gener. Comput. Syst..
[85] Christina Delimitrou,et al. Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.
[86] Odej Kao,et al. Network-aware resource management for scalable data analytics frameworks , 2015, 2015 IEEE International Conference on Big Data (Big Data).
[87] Julie McLeod,et al. Opening research data: issues and opportunities , 2014 .
[88] John Murphy,et al. Towards a Better Replica Management for Hadoop Distributed File System , 2018, 2018 IEEE International Congress on Big Data (BigData Congress).
[89] Scott Shenker,et al. Discretized streams: fault-tolerant streaming computation at scale , 2013, SOSP.
[90] Odej Kao,et al. Selecting resources for distributed dataflow systems according to runtime targets , 2016, 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC).
[91] Carlo Curino,et al. Morpheus: Towards Automated SLOs for Enterprise Clusters , 2016, OSDI.
[92] Odej Kao,et al. Adaptive Resource Management for Distributed Data Analytics based on Container-level Cluster Monitoring , 2017, DATA.
[93] Joseph K. Bradley,et al. Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.
[94] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.
[95] Murat Kantarcioglu,et al. SmartProvenance: A Distributed, Blockchain Based DataProvenance System , 2018, CODASPY.
[96] Odej Kao,et al. When to Use a Distributed Dataflow Engine: Evaluating the Performance of Apache Flink , 2016, 2016 Intl IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress (UIC/ATC/ScalCom/CBDCom/IoP/SmartWorld).
[97] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[98] Randy H. Katz,et al. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.
[99] Juan Benet,et al. IPFS - Content Addressed, Versioned, P2P File System , 2014, ArXiv.
[100] Elaine Shi,et al. Permacoin: Repurposing Bitcoin Work for Data Preservation , 2014, 2014 IEEE Symposium on Security and Privacy.
[101] Yuhong Feng,et al. An effective data locality aware task scheduling method for MapReduce framework in heterogeneous environments , 2011, 2011 International Conference on Cloud and Service Computing.
[102] Shengzhong Feng,et al. Improving Data Locality of MapReduce by Scheduling in Homogeneous Computing Environments , 2011, 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications.
[103] Scott Shenker,et al. Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks , 2014, SoCC.
[104] Geoffrey C. Fox,et al. Twister: a runtime for iterative MapReduce , 2010, HPDC '10.
[105] Seif Haridi,et al. Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..
[106] Ishai Menache,et al. Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can , 2015, Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication.
[107] Felix Naumann,et al. The Stratosphere platform for big data analytics , 2014, The VLDB Journal.
[108] Daniel Davis Wood,et al. ETHEREUM: A SECURE DECENTRALISED GENERALISED TRANSACTION LEDGER , 2014 .
[109] Randy H. Katz,et al. Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.
[110] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.
[111] Mark J. Clement,et al. Core Algorithms of the Maui Scheduler , 2001, JSSPP.
[112] Seif Haridi,et al. HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases , 2016, FAST.
[113] Odej Kao,et al. Ellis: Dynamically Scaling Distributed Dataflows to Meet Runtime Targets , 2017, 2017 IEEE International Conference on Cloud Computing Technology and Science (CloudCom).
[114] Lingjia Tang,et al. Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.
[115] Craig Chambers,et al. The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing , 2015, Proc. VLDB Endow..
[116] Ion Stoica,et al. Ernest: Efficient Performance Prediction for Large-Scale Advanced Analytics , 2016, NSDI.
[117] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).
[118] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[119] Carlo Curino,et al. Apache Hadoop YARN: yet another resource negotiator , 2013, SoCC.
[120] Srikanth Kandula,et al. Reoptimizing Data Parallel Computing , 2012, NSDI.