Providing Cooperative Data Analytics for Real Applications Using Machine Learning

This paper presents a data analytics system which determines optimal analytics algorithms by selectively testing a wide range of different algorithms and optimizing parameters using Transformer-Estimator Graphs. Our system is applicable to situations in which multiple clients need to perform calculations on the same data sets. Our system allows clients to cooperate in performing analytics calculations by sharing results and avoiding redundant calculations. Computations may be distributed across multiple nodes, including both client and server nodes. We provide multiple options for dealing with changes to data sets depending upon the data consistency requirements of applications. Another key contribution of our work is the Transformer-Estimator Graph, a system for specifying a wide variety of options to use for machine learning modeling and prediction. We show how Transformer-Estimator Graphs can be used for analyzing time series data. A key feature that we provide for making our system easy to use is solution templates which are customized to problems in specific domains.

[1]  Gilles Louppe,et al.  Independent consultant , 2013 .

[2]  Razvan Pascanu,et al.  Theano: new features and speed improvements , 2012, ArXiv.

[3]  Reynold Xin,et al.  Apache Spark , 2016 .

[4]  Carlos Guestrin,et al.  Distributed GraphLab : A Framework for Machine Learning and Data Mining in the Cloud , 2012 .

[5]  Geoffrey Zweig,et al.  An introduction to computational networks and the computational network toolkit (invited talk) , 2014, INTERSPEECH.

[6]  Mohit Sewak,et al.  Practical Convolutional Neural Networks: Implement advanced deep learning models using Python , 2018 .

[7]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[8]  David R. Cheriton,et al.  Leases: an efficient fault-tolerant mechanism for distributed file cache consistency , 1989, SOSP '89.

[9]  Joseph M. Hellerstein,et al.  GraphLab: A New Framework For Parallel Machine Learning , 2010, UAI.

[10]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[11]  et al.,et al.  Jupyter Notebooks - a publishing format for reproducible computational workflows , 2016, ELPUB.

[12]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[13]  Nikhil Ketkar,et al.  Introduction to PyTorch , 2021, Deep Learning with Python.

[14]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[15]  Θωμάς Αθανασίου Επιλογή χαρακτηριστικών δικτυακής κίνησης και ανίχνευση εισβολών με χρήση του Microsoft Azure Machine Learning Studio , 2018 .

[16]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[17]  Aurélien Géron,et al.  Hands-On Machine Learning with Scikit-Learn and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems , 2017 .

[18]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[19]  John Salvatier,et al.  Theano: A Python framework for fast computation of mathematical expressions , 2016, ArXiv.

[20]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[21]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[22]  Andrew Musselman Apache Mahout , 2019, Encyclopedia of Big Data Technologies.

[23]  Samy Bengio,et al.  Torch: a modular machine learning software library , 2002 .

[24]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[25]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..