Handling Iterations in Distributed Dataflow Systems
暂无分享,去创建一个
Volker Markl | Juan Soto | Gábor E. Gévay | V. Markl | G. Gévay | Juan Soto
[1] David Cunningham,et al. M3R: Increased performance for in-memory Hadoop jobs , 2012, Proc. VLDB Endow..
[2] John Liagouris,et al. Explaining Outputs in Modern Data Analytics , 2016, Proc. VLDB Endow..
[3] Yogesh L. Simmhan,et al. Scalable Graph Processing Frameworks , 2018, ACM Comput. Surv..
[4] Tilmann Rabl,et al. Efficient Control Flow in Dataflow Systems: When Ease-of-Use Meets High Performance , 2021, 2021 IEEE 37th International Conference on Data Engineering (ICDE).
[5] Fei Wang,et al. AutoGraph: Imperative-style Coding with Graph-based Performance , 2018, SysML.
[6] Luca Antiga,et al. Automatic differentiation in PyTorch , 2017 .
[7] Yuan Yu,et al. Dryad: distributed data-parallel programs from sequential building blocks , 2007, EuroSys '07.
[8] Volker Markl,et al. Spinning Fast Iterative Data Flows , 2012, Proc. VLDB Endow..
[9] Michael D. Ernst,et al. The HaLoop approach to large-scale iterative data analysis , 2012, The VLDB Journal.
[10] Rob Malouf,et al. A Comparison of Algorithms for Maximum Entropy Parameter Estimation , 2002, CoNLL.
[11] Geoffrey C. Fox,et al. MapReduce for Data Intensive Scientific Analyses , 2008, 2008 IEEE Fourth International Conference on eScience.
[12] Steven Feuerstein,et al. Oracle PL/SQL Programming , 1993 .
[13] Magdalena Balazinska,et al. Asynchronous and Fault-Tolerant Recursive Datalog Evaluation in Shared-Nothing Engines , 2015, Proc. VLDB Endow..
[14] Stefanie Lindstaedt,et al. SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle , 2020, CIDR.
[15] Letizia Tanca,et al. What you Always Wanted to Know About Datalog (And Never Dared to Ask) , 1989, IEEE Trans. Knowl. Data Eng..
[16] Aart J. C. Bik,et al. Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.
[17] Alan Edelman,et al. Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..
[18] Leonidas Fegaras,et al. An algebra for distributed Big Data analytics , 2017, Journal of Functional Programming.
[19] Eugene Burmako,et al. Scala macros: let our powers combine!: on how rich syntax and static types work with metaprogramming , 2013, SCALA@ECOOP.
[20] Sudipto Guha,et al. REX: Recursive, Delta-Based Data-Centric Computation , 2012, Proc. VLDB Endow..
[21] Volker Markl,et al. A survey of state management in big data processing systems , 2017, The VLDB Journal.
[22] Byung-Gon Chun,et al. JANUS: Fast and Flexible Deep Learning via Symbolic Graph Execution of Imperative Programs , 2018, NSDI.
[23] J. MacQueen. Some methods for classification and analysis of multivariate observations , 1967 .
[24] Neoklis Polyzotis,et al. Scaling Datalog for Machine Learning on Big Data , 2012, ArXiv.
[25] Rakesh Agrawal. Alpha: An Extension of Relational Algebra to Express a Class of Recursive Queries , 1988, IEEE Trans. Software Eng..
[26] Dan Suciu,et al. Demonstration of the Myria big data management service , 2014, SIGMOD Conference.
[27] Arvind,et al. Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.
[28] Yuan Yu,et al. TensorFlow: A system for large-scale machine learning , 2016, OSDI.
[29] Joseph M. Hellerstein,et al. Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..
[30] Alfred V. Aho,et al. Universality of data retrieval languages , 1979, POPL.
[31] Michael Isard,et al. Scalability! But at what COST? , 2015, HotOS.
[32] Geoffrey C. Fox,et al. Architecture and performance of runtime environments for data intensive scalable computing , 2010 .
[33] Martin Odersky,et al. Scala-virtualized , 2012, PEPM '12.
[34] Sebastian Pop,et al. SSA-based Compiler Design , 2016 .
[35] Brian Beckman,et al. LINQ: reconciling object, relations and XML in the .NET framework , 2006, SIGMOD Conference.
[36] Joseph Gonzalez,et al. PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.
[37] Yanfeng Zhang,et al. iMapReduce: A Distributed Computing Framework for Iterative Computation , 2011, Journal of Grid Computing.
[38] Craig Chambers,et al. FlumeJava: easy, efficient data-parallel pipelines , 2010, PLDI '10.
[39] Tilmann Rabl,et al. Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems , 2017, BTW.
[40] Haibo Chen,et al. SYNC or ASYNC: time to fuse for distributed graph-parallel computation , 2015, PPoPP.
[41] Geoffrey C. Fox,et al. Twister: a runtime for iterative MapReduce , 2010, HPDC '10.
[42] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.
[43] Leslie Lamport,et al. Time, clocks, and the ordering of events in a distributed system , 1978, CACM.
[44] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.
[45] Samy Bengio,et al. Revisiting Distributed Synchronous SGD , 2016, ArXiv.
[46] Tilmann Rabl,et al. Labyrinth: Compiling Imperative Control Flow to Parallel Dataflows , 2018, ArXiv.
[47] Jean-Philippe Martin,et al. Dandelion: a compiler and runtime for heterogeneous systems , 2013, SOSP.
[48] Jennifer Widom,et al. Optimizing Graph Algorithms on Pregel-like Systems , 2014, Proc. VLDB Endow..
[49] R. Ramakrishnan,et al. An amateur's introduction to recursive query processing strategies , 1986, SIGMOD '86.
[50] Paolo Papotti,et al. RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! - , 2018, Proc. VLDB Endow..
[51] Sherif Sakr,et al. Large scale graph processing systems: survey and an experimental evaluation , 2015, Cluster Computing.
[52] Jae-Gil Lee,et al. An Experimental Comparison of Iterative MapReduce Frameworks , 2016, CIKM.
[53] Sanjay Chawla,et al. A Cost-based Optimizer for Gradient Descent Optimization , 2017, SIGMOD Conference.
[54] Monica S. Lam,et al. SociaLite: Datalog extensions for efficient social network analysis , 2013, 2013 IEEE 29th International Conference on Data Engineering (ICDE).
[55] Yuanyuan Tian,et al. Big Graph Analytics Platforms , 2017, Found. Trends Databases.
[56] Michael Isard,et al. Composable Incremental and Iterative Data-Parallel Computation with Naiad , 2012 .
[57] Chen Xu,et al. Optimistic Recovery for Iterative Dataflows in Action , 2015, SIGMOD Conference.
[58] Oleksandr Pochayevets,et al. BMDFM: a hybrid dataflow runtime parallelization environment for shared memory multiprocessors , 2006 .
[59] Geoffrey C. Fox,et al. DryadLINQ for Scientific Analyses , 2009, 2009 Fifth IEEE International Conference on e-Science.
[60] Steven Hand,et al. Musketeer: all for one, one for all in data processing systems , 2015, EuroSys.
[61] Afaf G. Bin Saadon,et al. Survey on iterative and incremental approaches in distributed computing environment , 2019 .
[62] Torsten Grust,et al. PL/SQL Without the PL , 2020, SIGMOD Conference.
[63] Alexander J. Smola,et al. An architecture for parallel topic models , 2010, Proc. VLDB Endow..
[64] Felix Naumann,et al. The Stratosphere platform for big data analytics , 2014, The VLDB Journal.
[65] Shirish Tatikonda,et al. SystemML: Declarative Machine Learning on Spark , 2016, Proc. VLDB Endow..
[66] Neoklis Polyzotis,et al. Declarative Systems for Large-Scale Machine Learning , 2012, IEEE Data Eng. Bull..
[67] Peter J. Haas,et al. Compressed Linear Algebra for Large-Scale Machine Learning , 2016, Proc. VLDB Endow..
[68] Paolo Papotti,et al. Road to Freedom in Big Data Analytics , 2016, EDBT.
[69] M. Tamer Özsu,et al. An Experimental Comparison of Pregel-like Graph Processing Systems , 2014, Proc. VLDB Endow..
[70] Volker Markl,et al. Iterative parallel data processing with stratosphere: an inside look , 2013, SIGMOD '13.
[71] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.
[72] Martin Odersky,et al. Higher-order and Symbolic Computation Manuscript No. Scala-virtualized: Linguistic Reuse for Deep Embeddings , 2022 .
[73] Shirish Tatikonda,et al. SystemML: Declarative machine learning on MapReduce , 2011, 2011 IEEE 27th International Conference on Data Engineering.
[74] Andrew Eisenberg. New standard for stored procedures in SQL , 1996, SGMD.
[75] Laurence Tratt,et al. Domain specific language implementation via compile-time meta-programming , 2008, TOPL.
[76] Michael Isard,et al. Optimus: a dynamic rewriting framework for data-parallel execution plans , 2013, EuroSys '13.
[77] Volker Markl,et al. Representations and Optimizations for Embedded Parallel Dataflow Languages , 2019, ACM Trans. Database Syst..
[78] Martin Odersky,et al. Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs , 2010, GPCE '10.
[79] Seif Haridi,et al. Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..
[80] Wim Lamotte,et al. Automatic Parallelization of Probabilistic Models with Varying Load Imbalance , 2020, 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing (CCGRID).
[81] Ge Yu,et al. Automating Incremental and Asynchronous Evaluation for Recursive Aggregate Data Processing , 2020, SIGMOD Conference.
[82] Joseph K. Bradley,et al. Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.
[83] V. Markl,et al. The Power of Nested Parallelism in Big Data Processing Hitting Three Flies with One Slap , 2021, SIGMOD Conference.
[84] Erhard Rahm,et al. Management and Analysis of Big Graph Data: Current Systems and Open Challenges , 2017, Handbook of Big Data Technologies.
[85] Tao Wang,et al. Parallel Materialization of Datalog Programs with Spark for Scalable Reasoning , 2016, WISE.
[86] Reynold Xin,et al. Apache Spark , 2016 .
[87] Steven Hand,et al. Scripting the Cloud with Skywriting , 2010, HotCloud.
[88] Xin He,et al. Flint: batch-interactive data-intensive processing on transient servers , 2016, EuroSys.
[89] Torsten Grust,et al. Compiling PL/SQL Away , 2019, CIDR.
[90] Hairong Kuang,et al. The Hadoop Distributed File System , 2010, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST).
[91] Leonid Libkin,et al. Expressive power of SQL , 2001, Theor. Comput. Sci..
[92] Martín Abadi,et al. Dynamic control flow in large-scale machine learning , 2018, EuroSys.
[93] Dan Suciu,et al. Optimizing Large-Scale Semi-Naïve Datalog Evaluation in Hadoop , 2012, Datalog.
[94] Zheng Zhang,et al. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.
[95] Yanfeng Zhang,et al. PrIter: A Distributed Framework for Prioritizing Iterative Computations , 2011, IEEE Transactions on Parallel and Distributed Systems.
[96] Carlo Zaniolo,et al. Big Data Analytics with Datalog Queries on Spark , 2016, SIGMOD Conference.
[97] Krzysztof Stencel,et al. Recursive Query Facilities in Relational Databases: A Survey , 2010, FGIT-DTA/BSBT.
[98] Tiark Rompf,et al. The 800 Pound Python in the Machine Learning Room , 2018 .
[99] Jeffrey Xu Yu,et al. All-in-One: Graph Processing in RDBMSs Revisited , 2017, SIGMOD Conference.
[100] Torsten Grust,et al. One WITH RECURSIVE is Worth Many GOTOs , 2021, SIGMOD Conference.
[101] Paolo Papotti,et al. Rheem: Enabling Multi-Platform Task Execution , 2016, SIGMOD Conference.
[102] Tim Weninger,et al. Thinking Like a Vertex , 2015, ACM Comput. Surv..
[103] M. Abadi,et al. Naiad: a timely dataflow system , 2013, SOSP.
[104] Zekai J. Gao,et al. Declarative Recursive Computation on an RDBMS , 2020, SIGMOD Rec..
[105] Aruna Raja,et al. Domain Specific Languages , 2010 .
[106] Kun Li,et al. The MADlib Analytics Library or MAD Skills, the SQL , 2012, Proc. VLDB Endow..
[107] Michael D. Ernst,et al. HaLoop , 2010 .
[108] Fernando Sáenz-Pérez,et al. Formalizing a Broader Recursion Coverage in SQL , 2013, PADL.
[109] Rajeev Motwani,et al. The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.
[110] Chris Jermaine,et al. Declarative Recursive Computation on an RDBMS, or, Why You Should Use a Database For Distributed Machine Learning , 2019, ArXiv.
[111] Nils Gesbert,et al. On the Optimization of Iterative Programming with Distributed Data Collections , 2020 .
[112] Christos Doulkeridis,et al. A survey of large-scale analytical query processing in MapReduce , 2013, The VLDB Journal.
[113] Magdalena Balazinska,et al. Efficient iterative processing in the SciDB parallel array engine , 2015, SSDBM.
[114] Geoffrey C. Fox,et al. Fault-Tolerant Reliable Delivery of Messages in Distributed Publish/Subscribe Systems , 2007, Fourth International Conference on Autonomic Computing (ICAC'07).
[115] Michael D. Ernst,et al. HaLoop , 2010, Proc. VLDB Endow..
[116] John Salvatier,et al. Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.
[117] Boon Thau Loo,et al. Optimizing Declarative Graph Queries at Large Scale , 2019, SIGMOD Conference.
[118] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[119] Stephan Günnemann,et al. SQL- and Operator-centric Data Analytics in Relational Main-Memory Databases , 2017, EDBT.
[120] Carlo Zaniolo,et al. RaSQL: Greater Power and Performance for Big Data Analytics with Recursive-aggregate-SQL on Spark , 2019, SIGMOD Conference.
[121] Jim Melton,et al. SQL: 1999, formerly known as SQL3 , 1999, SGMD.
[122] Raul Castro Fernandez,et al. Making State Explicit for Imperative Big Data Processing , 2014, USENIX Annual Technical Conference.
[123] Dan Suciu,et al. The Myria Big Data Management and Analytics System and Cloud Services , 2017, CIDR.
[124] Michael Isard,et al. Differential Dataflow , 2013, CIDR.
[125] Patrick Wendell,et al. Sparrow: distributed, low latency scheduling , 2013, SOSP.
[126] Kunle Olukotun,et al. Language virtualization for heterogeneous parallel computing , 2010, OOPSLA.
[127] Ding Yuan,et al. Don't Get Caught in the Cold, Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead in Data-Parallel Systems , 2016, OSDI.
[128] Leonidas Fegaras,et al. Compile-Time Code Generation for Embedded Data-Intensive Query Languages , 2018, 2018 IEEE International Congress on Big Data (BigData Congress).
[129] Steven Hand,et al. CIEL: A Universal Execution Engine for Distributed Data-Flow Computing , 2011, NSDI.
[130] Reynold Cheng,et al. Walking in the Cloud: Parallel SimRank at Scale , 2015, Proc. VLDB Endow..
[131] Volker Markl,et al. Implicit Parallelism through Deep Language Embedding , 2016, SGMD.
[132] Philip Levis,et al. Execution Templates: Caching Control Plane Decisions for Strong Scaling of Data Analytics , 2017, USENIX Annual Technical Conference.
[133] Chen Xu,et al. On Fault Tolerance for Distributed Iterative Dataflow Processing , 2017, IEEE Transactions on Knowledge and Data Engineering.
[134] Michael Isard,et al. DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.
[135] Jorge-Arnulfo Quiané-Ruiz,et al. RHEEMix in the data jungle: a cost-based optimizer for cross-platform systems , 2018, The VLDB Journal.
[136] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[137] Tamer Elsayed,et al. iHadoop: Asynchronous Iterations for MapReduce , 2011, 2011 IEEE Third International Conference on Cloud Computing Technology and Science.
[138] Sherif Sakr,et al. The family of mapreduce and large-scale data processing systems , 2013, CSUR.
[139] Neil D. Jones,et al. An introduction to partial evaluation , 1996, CSUR.
[140] Byung-Gon Chun,et al. Speculative Symbolic Graph Execution of Imperative Deep Learning Programs , 2019, ACM SIGOPS Oper. Syst. Rev..
[141] Leonid Ryzhyk,et al. Differential Datalog , 2019, Datalog.
[142] Volker Markl,et al. Distributed Graph Analytics with Datalog Queries in Flink , 2020, SFDI/LSGDA@VLDB.
[143] Volker Markl,et al. "All roads lead to Rome": optimistic recovery for distributed iterative data processing , 2013, CIKM.
[144] Rares Vernica,et al. Hyracks: A flexible and extensible foundation for data-intensive computing , 2011, 2011 IEEE 27th International Conference on Data Engineering.
[145] Vítor Santos Costa,et al. Trebuchet: exploring TLP with dataflow virtualisation , 2011, Int. J. High Perform. Syst. Archit..
[146] Carlos Guestrin,et al. Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .