Machine learning has enabled many interesting applications and is extensively being used in big data systems. The popular approach training machine learning models in frameworks like Tensorflow, Pytorch and Keras requires movement of data from database engines to analytical engines, which adds an excessive overhead on data scientists and becomes a performance bottleneck for model training. In this demonstration, we give a practical exhibition of a solution for the enablement of distributed machine learning natively inside database engines. During the demo, the audience will interactively use Python APIs in Jupyter Notebooks to train multiple linear regression models on synthetic regression datasets and neural network models on vision and sensory datasets directly inside Teradata SQL Engine. PVLDB Reference Format: Sandeep Singh Sandha, Wellington Cabrera, Mohammed Al-Kateb, Sanjay Nair, and Mani Srivastava. In-Database Distributed Machine Learning: Demonstration in Teradata. PVLDB, 12(12): 1854-1857, 2019. DOI: https://doi.org/10.14778/3352063.3352083
[1]
Yann LeCun,et al.
The mnist database of handwritten digits
,
2005
.
[2]
Lin Wang,et al.
The University of Sussex-Huawei Locomotion and Transportation Dataset for Multimodal Analytics With Mobile Devices
,
2018,
IEEE Access.
[3]
Peter Richtárik,et al.
Federated Learning: Strategies for Improving Communication Efficiency
,
2016,
ArXiv.
[4]
Abutalib Aghayev,et al.
Litz: Elastic Framework for High-Performance Distributed Machine Learning
,
2018,
USENIX Annual Technical Conference.
[5]
Michael Stonebraker,et al.
The Case for Shared Nothing
,
1985,
HPTS.
[6]
Roland Vollgraf,et al.
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms
,
2017,
ArXiv.
[7]
John Catozzi,et al.
Operating System Extensions for the Teradata Parallel VLDB
,
2001,
VLDB.