Distributed Deep Learning on Data Systems: A Comparative Analysis of Approaches
暂无分享,去创建一个
Orhan Kislal | Nandish Jayaram | Arun Kumar | Yuhao Zhang | Frank Mcquillan | Nikhil Kak | Ekta Khanna | Domino Valdano | Arun Kumar | Yuhao Zhang | F. Mcquillan | Nandish Jayaram | Nikhil Kak | Ekta Khanna | Domino Valdano | Orhan Kislal
[1] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.
[2] Alexander J. Smola,et al. Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.
[3] Supun Nakandala,et al. Vista: Optimized System for Declarative Feature Transfer from Deep CNNs at Scale , 2020, SIGMOD Conference.
[4] Supun Nakandala,et al. Cerebro: A Data System for Optimized Deep Learning Model Selection , 2020, Proc. VLDB Endow..
[5] Bin Cui,et al. MLog: Towards Declarative In-Database Machine Learning , 2017, Proc. VLDB Endow..
[6] Yu Cheng,et al. GLADE: big data analytics made easy , 2012, SIGMOD Conference.
[7] Carsten Binnig,et al. Democratizing Data Science through Interactive Curation of ML Pipelines , 2019, SIGMOD Conference.
[8] Nick Koudas,et al. Efficient Construction of Approximate Ad-Hoc ML models Through Materialization and Reuse , 2018, Proc. VLDB Endow..
[9] Xing Xie,et al. xDeepFM: Combining Explicit and Implicit Feature Interactions for Recommender Systems , 2018, KDD.
[10] Max Jaderberg,et al. Population Based Training of Neural Networks , 2017, ArXiv.
[11] Hung Q. Ngo,et al. In-Database Learning with Sparse Tensors , 2017, PODS.
[12] Christopher Ré,et al. Towards a unified architecture for in-RDBMS analytics , 2012, SIGMOD Conference.
[13] Zhipeng Zhang,et al. PS2: Parameter Server on Spark , 2019, SIGMOD Conference.
[14] Berti-Equille Laure,et al. Machine Learning to Data Management: A Round Trip , 2018, 2018 IEEE 34th International Conference on Data Engineering (ICDE).
[15] Zhipeng Zhang,et al. MLlib*: Fast Training of GLMs Using Spark MLlib , 2019, 2019 IEEE 35th International Conference on Data Engineering (ICDE).
[16] Chunbin Lin,et al. Accelerating Analytic Queries on Compressed Data , 2018 .
[17] Supun Nakandala,et al. Cerebro: Efficient and Reproducible Model Selection on Deep Learning Systems , 2019, DEEM@SIGMOD.
[18] Raul Castro Fernandez,et al. Ako: Decentralised Deep Learning with Partial Gradient Exchange , 2016, SoCC.
[19] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[20] Aditya G. Parameswaran,et al. Helix: Holistic Optimization for Accelerating Iterative Machine Learning , 2018, Proc. VLDB Endow..
[21] P. Alam. ‘K’ , 2021, Composites Engineering.
[22] David J. DeWitt,et al. The Object-Oriented Database System Manifesto , 1994, Building an Object-Oriented Database System, The Story of O2.
[23] Jason Weston,et al. Deep learning via semi-supervised embedding , 2008, ICML '08.
[24] Yunming Ye,et al. DeepFM: A Factorization-Machine based Neural Network for CTR Prediction , 2017, IJCAI.
[25] Christopher Ré,et al. Extracting Databases from Dark Data with DeepDive , 2016, SIGMOD Conference.
[26] David D. Cox,et al. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.
[27] Wenwu Zhu,et al. Structural Deep Network Embedding , 2016, KDD.
[28] Takashi Matsubara,et al. Deep learning for stock prediction using numerical and textual information , 2016, 2016 IEEE/ACIS 15th International Conference on Computer and Information Science (ICIS).
[29] Eric Eide,et al. Introducing CloudLab: Scientific Infrastructure for Advancing Cloud Architectures and Applications , 2014, login Usenix Mag..
[30] Reynold Xin,et al. Lakehouse: A New Generation of Open Platforms that Unify Data Warehousing and Advanced Analytics , 2021, CIDR.
[31] Hang Su,et al. Experiments on Parallel Training of Deep Neural Network using Model Averaging , 2015, ArXiv.
[32] Shirish Tatikonda,et al. SystemML: Declarative Machine Learning on Spark , 2016, Proc. VLDB Endow..
[33] Ion Stoica,et al. Tune: A Research Platform for Distributed Model Selection and Training , 2018, ArXiv.
[34] Feng Liu,et al. Continuous Integration of Machine Learning Models with ease.ml/ci: Towards a Rigorous Yet Practical Treatment , 2019, SysML.
[35] Matthias Weidlich,et al. Crossbow: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers , 2019, Proc. VLDB Endow..
[36] Ameet Talwalkar,et al. A System for Massively Parallel Hyperparameter Tuning , 2020, MLSys.
[37] Chuck Bear,et al. Vertica-ML: Distributed Machine Learning in Vertica Database , 2020, SIGMOD Conference.
[38] Kun Li,et al. The MADlib Analytics Library or MAD Skills, the SQL , 2012, Proc. VLDB Endow..
[39] Ameet Talwalkar,et al. MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..
[40] Arun Kumar,et al. Cerebro: A Layered Data Platform for Scalable Deep Learning , 2021, CIDR.
[41] Surajit Chaudhuri,et al. An overview of data warehousing and OLAP technology , 1997, SGMD.
[42] Ce Zhang,et al. Ease.ml/ci and Ease.ml/meter in Action: Towards Data Management for Statistical Generalization , 2019, Proc. VLDB Endow..
[43] David Antonio Justo. Write once, rewrite everywhere: A Unified Framework for Factorized Machine Learning , 2019 .
[44] Dit-Yan Yeung,et al. Collaborative Deep Learning for Recommender Systems , 2014, KDD.
[45] Kun Li,et al. UDA-GIST: An In-database Framework to Unify Data-Parallel and State-Parallel Analytics , 2015, Proc. VLDB Endow..
[46] Ioannis Mitliagkas,et al. Parallel SGD: When does averaging help? , 2016, ArXiv.
[47] Özgür Yilmazel,et al. Apache Mahout: Machine Learning on Distributed Dataflow Systems , 2020, J. Mach. Learn. Res..
[48] Neoklis Polyzotis,et al. Data Management Challenges in Production Machine Learning , 2017, SIGMOD Conference.
[49] Shai Ben-David,et al. Understanding Machine Learning: From Theory to Algorithms , 2014 .
[50] Gang Chen,et al. SINGA: Putting Deep Learning in the Hands of Multimedia Users , 2015, ACM Multimedia.
[51] Manasi Vartak,et al. ModelDB: a system for machine learning model management , 2016, HILDA '16.
[52] C. Jermaine,et al. Tensor Relational Algebra for Distributed Machine Learning System Design , 2020, Proc. VLDB Endow..
[53] Chris Jermaine,et al. Declarative Parameterizations of User-Defined Functions for Large-Scale Machine Learning and Optimization , 2019, IEEE Transactions on Knowledge and Data Engineering.
[54] D. Sculley,et al. Google Vizier: A Service for Black-Box Optimization , 2017, KDD.
[55] Chris Jermaine,et al. Declarative Recursive Computation on an RDBMS , 2019, Proc. VLDB Endow..
[56] Dynamic parameter allocation in parameter servers , 2020, Proc. VLDB Endow..
[57] Jun Yang,et al. Data Management in Machine Learning: Challenges, Techniques, and Systems , 2017, SIGMOD Conference.
[58] Juliana Freire,et al. Visus: An Interactive System for Automatic Machine Learning Model Building and Curation , 2019, HILDA@SIGMOD.
[59] Chris Jermaine,et al. Declarative Recursive Computation on an RDBMS, or, Why You Should Use a Database For Distributed Machine Learning , 2019, ArXiv.
[60] Beng Chin Ooi,et al. Rafiki: Machine Learning as an Analytics Service System , 2018, Proc. VLDB Endow..
[61] Dennis Shasha,et al. Debugging Machine Learning Pipelines , 2019, DEEM@SIGMOD.
[62] Christopher De Sa,et al. Data Programming: Creating Large Training Sets, Quickly , 2016, NIPS.
[63] Stephan Günnemann,et al. MLearn: A Declarative Machine Learning Language for Database Systems , 2019, DEEM@SIGMOD.
[64] Anthony K. H. Tung,et al. SINGA: A Distributed Deep Learning Platform , 2015, ACM Multimedia.
[65] WangWei,et al. Effective deep learning-based multi-modal retrieval , 2016, VLDB 2016.
[66] Quanshi Zhang,et al. Visual interpretability for deep learning: a survey , 2018, Frontiers of Information Technology & Electronic Engineering.
[67] Benjamin Recht,et al. KeystoneML: Optimizing Pipelines for Large-Scale Advanced Analytics , 2016, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).
[68] Tim Kraska,et al. ARDA , 2020, Proc. VLDB Endow..
[69] Xin Zhang,et al. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform , 2017, KDD.
[70] Jeffrey F. Naughton,et al. Model Selection Management Systems: The Next Frontier of Advanced Analytics , 2016, SGMD.
[71] Atsuo Yoshitaka,et al. A Survey on Content-Based Retrieval for Multimedia Databases , 1999, IEEE Trans. Knowl. Data Eng..
[72] Abutalib Aghayev,et al. Litz: Elastic Framework for High-Performance Distributed Machine Learning , 2018, USENIX Annual Technical Conference.
[73] Tilmann Rabl,et al. An Intermediate Representation for Optimizing Machine Learning Pipelines , 2019, Proc. VLDB Endow..
[74] Claire Laybats,et al. GDPR , 2018, Business Information Review.
[75] Harm de Vries,et al. RMSProp and equilibrated adaptive learning rates for non-convex optimization. , 2015 .
[76] Stephen H. Bach,et al. Snorkel: rapid training data creation with weak supervision , 2019, The VLDB Journal.
[77] Bettina Kemme,et al. AIDA - Abstraction for Advanced In-Database Analytics , 2018, Proc. VLDB Endow..
[78] Sanjay Krishnan,et al. BoostClean: Automated Error Detection and Repair for Machine Learning , 2017, ArXiv.
[79] Nishant Agarwal. A Real-time Temporal Clustering Algorithm for short text, and its applications , 2017 .
[80] Alexander J. Smola,et al. Parallelized Stochastic Gradient Descent , 2010, NIPS.
[81] Matthew Rocklin,et al. Dask: Parallel Computation with Blocked algorithms and Task Scheduling , 2015, SciPy.
[82] Shirish Tatikonda,et al. Hybrid Parallelization Strategies for Large-Scale Machine Learning in SystemML , 2014, Proc. VLDB Endow..
[83] Shu Lin,et al. DISIMA: a distributed and interoperable image database system , 2000, SIGMOD '00.
[84] Masahito Hirakawa,et al. MORE: An Object-Oriented Data Model with a Facility for Changing Object Structures , 1991, IEEE Trans. Knowl. Data Eng..
[85] Sanjay Krishnan,et al. ActiveClean: Interactive Data Cleaning For Statistical Modeling , 2016, Proc. VLDB Endow..
[86] Christopher Ré,et al. Probabilistic Management of OCR Data using an RDBMS , 2011, Proc. VLDB Endow..
[87] Michael N. Gubanov,et al. Scalable Linear Algebra on a Relational Database System , 2017, 2017 IEEE 33rd International Conference on Data Engineering (ICDE).
[88] Gang Fu,et al. Deep & Cross Network for Ad Click Predictions , 2017, ADKDD@KDD.
[89] Carlo Curino,et al. Extending Relational Query Processing with ML Inference , 2019, CIDR.
[90] Susie Stephens,et al. Oracle Data Mining , 2005 .
[91] Jeffrey F. Naughton,et al. Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent , 2017, SIGMOD Conference.
[92] Alin Deutsch,et al. Vertex-centric Parallel Computation of SQL Queries , 2021, SIGMOD Conference.
[93] Felix Bießmann,et al. On Challenges in Machine Learning Model Management , 2018, IEEE Data Eng. Bull..
[94] 장윤희,et al. Y. , 2003, Industrial and Labor Relations Terms.
[95] Alun D. Preece,et al. Interpretability of deep learning models: A survey of results , 2017, 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI).
[96] Jiaheng Lu,et al. Tutorial Proposal : Synergy of Database Techniques and Machine Learning Models for String Similarity Search and Join , 2019 .
[97] Xavier Bouthillier,et al. Survey of machine-learning experimental methods at NeurIPS2019 and ICLR2020 , 2020 .
[98] Berthold Reinwald,et al. On Optimizing Operator Fusion Plans for Large-Scale Machine Learning in SystemML , 2018, Proc. VLDB Endow..
[99] Christopher Ré,et al. Brainwash: A Data System for Feature Engineering , 2013, CIDR.
[100] Christopher Ré,et al. Snorkel: Rapid Training Data Creation with Weak Supervision , 2017, Proc. VLDB Endow..
[101] Michael I. Jordan,et al. Ray: A Distributed Framework for Emerging AI Applications , 2017, OSDI.
[102] Xiaogang Wang,et al. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[103] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..
[104] Tilmann Rabl,et al. Optimizing Machine Learning Workloads in Collaborative Environments , 2020, SIGMOD Conference.
[105] Samuel Madden,et al. MODELDB: Opportunities and Challenges in Managing Machine Learning Models , 2018, IEEE Data Eng. Bull..
[106] Markus Weimer,et al. Vamsa: Automated Provenance Tracking in Data Science Scripts , 2020, KDD.
[107] Stefan Manegold,et al. Deep Integration of Machine Learning Into Column Stores , 2018, EDBT.
[108] Carsten Binnig,et al. DB4ML - An In-Memory Database Kernel with Machine Learning Support , 2020, SIGMOD Conference.
[109] Carlos Ordonez,et al. Integrating K-means clustering with a relational DBMS using SQL , 2006, IEEE Transactions on Knowledge and Data Engineering.
[110] Fan Yang,et al. FlexPS: Flexible Parallelism Control in Parameter Server Architecture , 2018, Proc. VLDB Endow..
[111] Frederick Reiss,et al. Compressed linear algebra for large-scale machine learning , 2016, The VLDB Journal.
[112] K. Selçuk Candan,et al. Efficient Static and Dynamic In-Database Tensor Decompositions on Chunk-Based Array Stores , 2014, CIKM.
[113] Carlo Curino,et al. Cloudy with high chance of DBMS: a 10-year prediction for Enterprise-Grade ML , 2020, CIDR.