Deployment and verification of machine learning tool-chain based on kubernetes distributed clusters

[1]  Ekaba Bisong,et al.  Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners , 2019 .

[2]  Silvio Savarese,et al.  Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Gang Yin,et al.  An Insight Into the Impact of Dockerfile Evolutionary Trajectories on Quality and Latency , 2018, 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC).

[4]  Joseph Redmon,et al.  YOLOv3: An Incremental Improvement , 2018, ArXiv.

[5]  Marko Lukša,et al.  Kubernetes in Action , 2018, Kubernetes in Action.

[6]  Dhabaleswar K. Panda,et al.  Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning , 2016, EuroMPI.

[7]  Mike Amundsen,et al.  Microservice Architecture: Aligning Principles, Practices, and Culture , 2016 .

[8]  Andy Davis,et al.  This Paper Is Included in the Proceedings of the 12th Usenix Symposium on Operating Systems Design and Implementation (osdi '16). Tensorflow: a System for Large-scale Machine Learning Tensorflow: a System for Large-scale Machine Learning , 2022 .

[9]  Oliver Kramer,et al.  Machine Learning for Evolution Strategies , 2016 .

[10]  Pooyan Jamshidi,et al.  Microservices Architecture Enables DevOps: Migration to a Cloud-Native Architecture , 2016, IEEE Software.

[11]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[12]  Claus Pahl,et al.  Containerization and the PaaS Cloud , 2015, IEEE Cloud Computing.

[13]  David Bernstein,et al.  Containers and Cloud: From LXC to Docker to Kubernetes , 2014, IEEE Cloud Computing.

[14]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[15]  M. Zaharia,et al.  Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center , 2011, NSDI.

[16]  George Bosilca,et al.  Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation , 2004, PVM/MPI.

[17]  RICHARD KOO,et al.  Checkpointing and Rollback-Recovery for Distributed Systems , 1986, IEEE Transactions on Software Engineering.

[18]  Nikhil Ketkar,et al.  Introduction to PyTorch , 2021, Deep Learning with Python.

[19]  Ekaba Bisong,et al.  Kubeflow and Kubeflow Pipelines , 2019, Building Machine Learning and Deep Learning Models on Google Cloud Platform.

[20]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[21]  IEEE conference on computer vision and pattern recognition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[22]  Dan Walsh,et al.  Design and implementation of the Sun network filesystem , 1985, USENIX Conference Proceedings.