HotML: A DSM-based machine learning system for social networks

Abstract In big data era, social networks, such as Twitter, Weibo, Facebook, are becoming more and more popular worldwide. To help social networks analysis, many machine learning (ML) algorithms have been adopted, e.g. user classification, link prediction, sentiment analysis, recommendations, etc. However, the dataset could be so large that it might take even days to train a model on a machine learning system. Performance issues should be considered to boost the training process. In this paper, we proposed HotML, a general machine learning system. HotML is designed in the parameter server (PS) architecture where the servers manage the globally shared parameters organized in tabular structure, and the workers compute the dataset in parallel and update the global parameters. HotML is based on our prior work DPS that provides high-level data abstraction, lightweight task scheduling system, and SSP consistency. HotML improved the DPS design by decoupling PS server and PS worker physically, and provides flexible consistency models including SSPPush, SSPDrop besides SSP, fault tolerance including consistent server-side checkpoint and flexible worker-side checkpoint, and workload balancing. To demonstrate the performance and scalability of the proposed system, a series of experiments are conducted and the results show that HotML can reduce networking time by about 74%, and achieve up to 1.9× performance compared to the popular ML system, Petuum.

[1]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[2]  H. Howie Huang,et al.  Enterprise: breadth-first graph traversal on GPUs , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.

[3]  H. Howie Huang,et al.  iBFS: Concurrent Breadth-First Search on GPUs , 2016, SIGMOD Conference.

[4]  Wei Li,et al.  Tux2: Distributed Graph Computation for Machine Learning , 2017, NSDI.

[5]  Alexander J. Smola,et al.  Scalable inference in latent variable models , 2012, WSDM '12.

[6]  Allan Porterfield,et al.  The Tera computer system , 1990 .

[7]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[8]  Inderjit S. Dhillon,et al.  Generalized Nonnegative Matrix Approximations with Bregman Divergences , 2005, NIPS.

[9]  Martin Wattenberg,et al.  Ad click prediction: a view from the trenches , 2013, KDD.

[10]  Alexander J. Smola,et al.  Like like alike: joint friendship and interest propagation in social networks , 2011, WWW.

[11]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[12]  Allan Porterfield,et al.  Exploiting heterogeneous parallelism on a multithreaded multiprocessor , 1992, ICS '92.

[13]  Martin Ester,et al.  A matrix factorization technique with trust propagation for recommendation in social networks , 2010, RecSys '10.

[14]  Alexander J. Smola,et al.  An architecture for parallel topic models , 2010, Proc. VLDB Endow..

[15]  Jianxin Li,et al.  Towards an efficient snapshot approach for virtual machines in clouds , 2017, Inf. Sci..

[16]  Jieping Ye,et al.  Large-scale sparse logistic regression , 2009, KDD.

[17]  Jure Leskovec,et al.  Overlapping community detection at scale: a nonnegative matrix factorization approach , 2013, WSDM.

[18]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[19]  Vikas Sindhwani,et al.  Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization , 2012, WSDM '12.

[20]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[21]  Eric P. Xing,et al.  High-Performance Distributed ML at Scale through Parameter Server Consistency Models , 2014, AAAI.

[22]  Alex Hai Wang,et al.  Detecting Spam Bots in Online Social Networking Sites: A Machine Learning Approach , 2010, DBSec.

[23]  Scott Shenker,et al.  Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks , 2014, SoCC.

[24]  H. Howie Huang,et al.  Graphene: Fine-Grained IO Management for Graph Computing , 2017, FAST.

[25]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[26]  Ke Xu,et al.  MoodLens: an emoticon-based sentiment analysis system for chinese tweets , 2012, KDD.

[27]  Seunghak Lee,et al.  More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server , 2013, NIPS.

[28]  Joseph M. Hellerstein,et al.  Distributed GraphLab: A Framework for Machine Learning in the Cloud , 2012, Proc. VLDB Endow..

[29]  Jianxin Li,et al.  HotSnap: A Hot Distributed Snapshot System For Virtual Machine Cluster , 2013, LISA.

[30]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[31]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[32]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[33]  W. Karush Minima of Functions of Several Variables with Inequalities as Side Conditions , 2014 .

[34]  Hila Becker,et al.  Learning similarity metrics for event identification in social media , 2010, WSDM '10.

[35]  Jacob Nelson,et al.  Latency-Tolerant Software Distributed Shared Memory , 2015, USENIX Annual Technical Conference.

[36]  Cecilia Mascolo,et al.  Exploiting place features in link prediction on location-based social networks , 2011, KDD.

[37]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[38]  Seunghak Lee,et al.  Exploiting Bounded Staleness to Speed Up Big Data Analytics , 2014, USENIX Annual Technical Conference.

[39]  Shou-De Lin,et al.  Feature Engineering and Classifier Ensemble for KDD Cup 2010 , 2010, KDD 2010.

[40]  Yaoliang Yu,et al.  Petuum: A New Platform for Distributed Machine Learning on Big Data , 2015, IEEE Trans. Big Data.

[41]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[42]  Jinpeng Huai,et al.  Ring: Real-Time Emerging Anomaly Monitoring System Over Text Streams , 2019, IEEE Transactions on Big Data.

[43]  Ameet Talwalkar,et al.  MLlib: Machine Learning in Apache Spark , 2015, J. Mach. Learn. Res..

[44]  Trishul M. Chilimbi,et al.  Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.

[45]  Jianxin Li,et al.  DPS: A DSM-based Parameter Server for Machine Learning , 2017, 2017 14th International Symposium on Pervasive Systems, Algorithms and Networks & 2017 11th International Conference on Frontier of Computer Science and Technology & 2017 Third International Symposium of Creative Computing (ISPAN-FCST-ISCC).

[46]  Charles Elkan,et al.  Link Prediction via Matrix Factorization , 2011, ECML/PKDD.