The Case for Learning-and-System Co-design

While decision-makings in systems are commonly solved with explicit rules and heuristics, machine learning (ML) and deep learning (DL) have been driving a paradigm shift in modern system design. Based on our decade of experience in operationalizing a large production cloud system, Web Search, learning fills the gap in comprehending and taming the system design and operation complexity. However, rather than just improving specific ML/DL algorithms or system features, we posit that the key to unlocking the full potential of learning-augmented systems is a principled methodology promoting learning-and-system co-design. On this basis, we present the AutoSys, a common framework for the development of learning-augmented systems.

[1]  Bhaskar Mitra,et al.  Optimizing Query Evaluations Using Reinforcement Learning for Web Search , 2018, SIGIR.

[2]  Michael D. Ernst,et al.  Which configuration option should I change? , 2014, ICSE.

[3]  Christoforos E. Kozyrakis,et al.  Learning Memory Access Patterns , 2018, ICML.

[4]  Quoc V. Le,et al.  Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.

[5]  Kevin Leyton-Brown,et al.  Sequential Model-Based Optimization for General Algorithm Configuration , 2011, LION.

[6]  Yann LeCun,et al.  Large Scale Online Learning , 2003, NIPS.

[7]  Tim Kraska,et al.  The Case for Learned Index Structures , 2018 .

[8]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[9]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[10]  Gregory R. Ganger,et al.  Self-* Storage: Brick-based Storage with Automated Administration (CMU-CS-03-178) , 2003 .

[11]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[12]  Mark B. Ring CHILD: A First Step Towards Continual Learning , 1997, Machine Learning.

[13]  Jin Jiang,et al.  Metis: robustly optimizing tail latencies of cloud systems , 2018, USENIX ATC 2018.

[14]  Hongzi Mao,et al.  Neural Adaptive Video Streaming with Pensieve , 2017, SIGCOMM.

[15]  Yuanyuan Zhou,et al.  Understanding Customer Problem Troubleshooting from Storage System Logs , 2009, FAST.

[16]  Randy H. Katz,et al.  A Berkeley View of Systems Challenges for AI , 2017, ArXiv.

[17]  Ricardo Bianchini,et al.  Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms , 2017, SOSP.

[18]  Aditya Akella,et al.  Demystifying configuration challenges and trade-offs in network-based ISP services , 2011, SIGCOMM 2011.

[19]  Junfeng Yang,et al.  Practical software model checking via dynamic interface reduction , 2011, SOSP.

[20]  Carlos Urias Munoz,et al.  Automatic Generation of Random Self-Checking Test Cases , 1983, IBM Syst. J..

[21]  Enhong Chen,et al.  Systematically testing background services of mobile apps , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[22]  Nando de Freitas,et al.  Taking the Human Out of the Loop: A Review of Bayesian Optimization , 2016, Proceedings of the IEEE.

[23]  Samy Bengio,et al.  Device Placement Optimization with Reinforcement Learning , 2017, ICML.

[24]  Xin Wang,et al.  Machine Learning for Networking: Workflow, Advances and Opportunities , 2017, IEEE Network.

[25]  Geoffrey J. Gordon,et al.  Automatic Database Management System Tuning Through Large-scale Machine Learning , 2017, SIGMOD Conference.

[26]  Christopher J. C. Burges,et al.  High accuracy retrieval with multiple nested ranker , 2006, SIGIR.

[27]  David A. Patterson,et al.  Technical perspective: the data center is the computer , 2008, CACM.

[28]  Bo Chen,et al.  MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Randy H. Katz,et al.  Static extraction of program configuration options , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[30]  Tudor Dumitras,et al.  Cloud software upgrades: Challenges and opportunities , 2011, 2011 International Workshop on the Maintenance and Evolution of Service-Oriented and Cloud-Based Systems.

[31]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[32]  Minlan Yu,et al.  CherryPick: Adaptively Unearthing the Best Cloud Configurations for Big Data Analytics , 2017, NSDI.

[33]  D. L. Parnas,et al.  On the criteria to be used in decomposing systems into modules , 1972, Software Pioneers.

[34]  Ranveer Chandra,et al.  Caiipa: automated large-scale mobile app testing through contextual fuzzing , 2014, MobiCom.

[35]  Xiao Ma,et al.  An empirical study on configuration errors in commercial and open source systems , 2011, SOSP.

[36]  Haoxiang Lin,et al.  MODIST: Transparent Model Checking of Unmodified Distributed Systems , 2009, NSDI.

[37]  Quoc V. Le,et al.  A Hierarchical Model for Device Placement , 2018, ICLR.