暂无分享,去创建一个
Michael E. Papka | Rajkumar Kettimuthu | Zhengchun Liu | Ian Foster | Ian T. Foster | M. Papka | R. Kettimuthu | Zhengchun Liu
[1] Jan Kautz,et al. UNAS: Differentiable Architecture Search Meets Reinforcement Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[2] Zhiling Lan,et al. Experience and Practice of Batch Scheduling on Leadership Supercomputers at Argonne , 2017, JSSPP.
[3] David P. Anderson,et al. SETI@home: an experiment in public-resource computing , 2002, CACM.
[4] Anne E. James,et al. Priority-grouping method for parallel multi-scheduling in Grid , 2015, J. Comput. Syst. Sci..
[5] Michael E. Papka,et al. Characterization and identification of HPC applications at leadership computing facility , 2020, ICS.
[6] Shantenu Jha,et al. SAGA BigJob: An Extensible and Interoperable Pilot-Job Abstraction for Distributed Applications and Systems , 2010, 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing.
[7] Bo Chen,et al. MnasNet: Platform-Aware Neural Architecture Search for Mobile , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Igor Sfiligoi,et al. The Pilot Way to Grid Resources Using glideinWMS , 2009, 2009 WRI World Congress on Computer Science and Information Engineering.
[9] Yong Zhao,et al. Falkon: a Fast and Light-weight tasK executiON framework , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).
[10] Shantenu Jha,et al. A Comprehensive Perspective on Pilot-Job Systems , 2015, ACM Comput. Surv..
[11] Dmitry N. Zotkin,et al. Job-length estimation and performance in backfilling schedulers , 1999, Proceedings. The Eighth International Symposium on High Performance Distributed Computing (Cat. No.99TH8469).
[12] Francine Berman,et al. Application-Level Scheduling on Distributed Heterogeneous Networks , 1996, Proceedings of the 1996 ACM/IEEE Conference on Supercomputing.
[13] E. M. L. Beale,et al. Global optimization using special ordered sets , 1976, Math. Program..
[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[15] Kaiming He,et al. Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour , 2017, ArXiv.
[16] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[17] Xiangyu Zhang,et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[18] Hossein Bobarshad,et al. HyperTune: Dynamic Hyperparameter Tuning for Efficient Distribution of DNN Training Over Heterogeneous Systems , 2020, 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD).
[19] Quoc V. Le,et al. Efficient Neural Architecture Search via Parameter Sharing , 2018, ICML.
[20] Emad Barsoum,et al. Scaling Distributed Training with Adaptive Summation , 2020, MLSys.
[21] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[22] Ian T. Foster,et al. Condor-G: A Computation Management Agent for Multi-Institutional Grids , 2004, Cluster Computing.
[23] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[24] Eduardo Huedo,et al. GWpilot: Enabling multi-level scheduling in distributed infrastructures with GridWay and pilot jobs , 2015, Future Gener. Comput. Syst..
[25] Anand Sivasubramaniam,et al. An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration , 2001, JSSPP.
[26] P. Sadayappan,et al. Characterization of backfilling strategies for parallel job scheduling , 2002, Proceedings. International Conference on Parallel Processing Workshop.
[27] Paul Marshall,et al. Improving Utilization of Infrastructure Clouds , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[28] Quoc V. Le,et al. Don't Decay the Learning Rate, Increase the Batch Size , 2017, ICLR.
[29] Bo Chen,et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.
[30] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.
[31] Douglas Thain,et al. Distributed computing in practice: the Condor experience , 2005, Concurr. Pract. Exp..
[32] Alexander Sergeev,et al. Horovod: fast and easy distributed deep learning in TensorFlow , 2018, ArXiv.
[33] Torsten Hoefler,et al. Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis. , 2018 .
[34] Zhiling Lan,et al. Deep Reinforcement Agent for Scheduling in HPC , 2021, 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[35] Dror G. Feitelson,et al. Supporting priorities and improving utilization of the IBM SP scheduler using slack-based backfilling , 1999, Proceedings 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing. IPPS/SPDP 1999.
[36] W. Allcock,et al. Job Characteristics on Large-Scale Systems: Long-Term Analysis, Quantification, and Implications , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.
[37] Ion Stoica,et al. HyperSched: Dynamic Resource Reallocation for Model Development on a Deadline , 2019, SoCC.
[38] Eric P. Xing,et al. Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning , 2020, OSDI.
[39] Andrea Lodi,et al. MIPLIB 2010 , 2011, Math. Program. Comput..
[40] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.
[41] Lars Schmidt-Thieme,et al. Hyp-RL : Hyperparameter Optimization by Reinforcement Learning , 2019, ArXiv.
[42] Erik Elmroth,et al. A2L2: An Application Aware Flexible HPC Scheduling Model for Low-Latency Allocation , 2015, VTDC@HPDC.