Methodological Principles for Reproducible Performance Evaluation in Cloud Computing SPEC RG Cloud Working Group
暂无分享,去创建一个
Cristina L. Abad | Alexandru Iosup | Jóakim von Kistowski | Ahmed Ali-Eldin | Alessandro Vittorio Papadopoulos | Petr Tuma | Laurens Versluis | Nikolas Herbst | Andre Bauer | José Nelson Amaral | J. N. Amaral | N. Herbst | J. V. Kistowski | A. Iosup | A. Papadopoulos | A. Bauer | L. Versluis | A. Ali-Eldin | P. Tůma | Alessandro V. Papadopoulos
[1] Allen D. Malony,et al. Models for performance perturbation analysis , 1991, PADD '91.
[2] Antonín Steinhauser,et al. DOs and DON'Ts of Conducting Performance Measurements in Java , 2015, ICPE.
[3] Petr Tuma,et al. Precise Regression Benchmarking with Random Effects: Improving Mono Benchmark Results , 2006, EPEW.
[4] Grigori Melnik,et al. On the success of empirical studies in the international conference on software engineering , 2006, ICSE.
[5] Ian T. Jolliffe. 10. Exploratory and Multivariate Data Analysis , 1993 .
[6] Jóakim von Kistowski,et al. How to Build a Benchmark , 2015, ICPE.
[7] Rouven Krebs,et al. Ready for Rain? A View from SPEC Research on the Future of Cloud Metrics , 2016, ArXiv.
[8] Tim Brecht,et al. Conducting Repeatable Experiments in Highly Variable Cloud Computing Environments , 2017, ICPE.
[9] Alexandru Iosup,et al. Sampling Bias in BitTorrent Measurements , 2010, Euro-Par.
[10] Mike Hibler,et al. Apt: A Platform for Repeatable Research in Computer Science , 2015, OPSR.
[11] Grigori Fursin,et al. Collective Knowledge: Towards R&D sustainability , 2016, 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[12] Balachander Krishnamurthy,et al. A Socratic method for validation of measurement-based networking research , 2011, Comput. Commun..
[13] Cristina L. Abad,et al. Methodological Principles for Reproducible Performance Evaluation in Cloud Computing , 2019, IEEE Transactions on Software Engineering.
[14] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .
[15] Eric Eide,et al. An Experimentation Workbench for Replayable Networking Research , 2007, NSDI.
[16] Bruno Schulze,et al. High Performance Computing Evaluation A methodology based on Scientific Application Requirements , 2014, ArXiv.
[17] G. Annas,et al. The whole truth and nothing but the truth? , 1988, The Hastings center report.
[18] Pearl Brereton,et al. Systematic literature reviews in software engineering - A systematic literature review , 2009, Inf. Softw. Technol..
[19] Steven Hand,et al. The Seven Deadly Sins of Cloud Computing Research , 2012, HotCloud.
[20] Walter Binder,et al. The JVM is not observable enough (and what to do about it) , 2012, VMIL '12.
[21] Kai Petersen,et al. Systematic Mapping Studies in Software Engineering , 2008, EASE.
[22] Cristina L. Abad,et al. Quantifying Cloud Performance and Dependability , 2018, ACM Trans. Model. Perform. Evaluation Comput. Syst..
[23] Anton Nekrutenko,et al. Ten Simple Rules for Reproducible Computational Research , 2013, PLoS Comput. Biol..
[24] T. Hesterberg,et al. What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum , 2014, The American statistician.
[25] Jacob Cohen,et al. Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .
[26] Samuel Kounev,et al. On the Value of Service Demand Estimation for Auto-scaling , 2018, MMB.
[27] Allen D. Malony,et al. Performance Measurement Intrusion and Perturbation Analysis , 1992, IEEE Trans. Parallel Distributed Syst..
[28] Michael Ley,et al. The DBLP Computer Science Bibliography: Evolution, Research Issues, Perspectives , 2002, SPIRE.
[29] V PapadopoulosAlessandro,et al. An Experimental Performance Evaluation of Autoscalers for Complex Workflows , 2018 .
[30] Kai Petersen,et al. Guidelines for conducting systematic mapping studies in software engineering: An update , 2015, Inf. Softw. Technol..
[31] Mohamed Sayeed,et al. Measuring High-Performance Computing with Real Applications , 2008, Computing in Science & Engineering.
[32] Andreas Zeller,et al. The Truth, The Whole Truth, and Nothing But the Truth , 2016, ACM Trans. Program. Lang. Syst..
[33] Dror G. Feitelson,et al. Pitfalls in Parallel Job Scheduling Evaluation , 2005, JSSPP.
[34] José Nelson Amaral,et al. The Alberta Workloads for the SPEC CPU 2017 Benchmark Suite , 2018, 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[35] Alexandru Iosup,et al. IaaS cloud benchmarking: approaches, challenges, and experience , 2013, HotTopiCS '13.
[36] E. Iso,et al. Measurement Uncertainty and Probability: Guide to the Expression of Uncertainty in Measurement , 1995 .
[37] Larry L. Peterson,et al. Using PlanetLab for network research: myths, realities, and best practices , 2005, OPSR.
[38] Michael F. P. O'Boyle,et al. Rapidly Selecting Good Compiler Optimizations using Performance Counters , 2007, International Symposium on Code Generation and Optimization (CGO'07).
[39] Amela Karahasanovic,et al. A survey of controlled experiments in software engineering , 2005, IEEE Transactions on Software Engineering.
[40] Alexandru Iosup,et al. On the Performance Variability of Production Cloud Services , 2011, 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.
[41] S. Matteson. The truth, the whole truth, and nothing but the truth. , 2012, Texas dental journal.
[42] Jorge-Arnulfo Quiané-Ruiz,et al. Runtime measurements in the cloud , 2010, Proc. VLDB Endow..
[43] Lieven Eeckhout,et al. Measuring benchmark similarity using inherent program characteristics , 2006, IEEE Transactions on Computers.
[44] Thomas C. Herndon,et al. Does high public debt consistently stifle economic growth? A critique of Reinhart and Rogoff , 2014 .
[45] Matthias Hauswirth,et al. Evaluating the accuracy of Java profilers , 2010, PLDI '10.
[46] Thomas Reidemeister,et al. DataMill: rigorous performance evaluation made easy , 2013, ICPE '13.
[47] Sneha Kumar Kasera,et al. The Flexlab Approach to Realistic Evaluation of Networked Systems , 2007, NSDI.
[48] Philip J. Fleming,et al. How not to lie with statistics: the correct way to summarize benchmark results , 1986, CACM.
[49] Barry N. Taylor,et al. Guidelines for Evaluating and Expressing the Uncertainty of Nist Measurement Results , 2017 .
[50] Philipp Leitner,et al. Patterns in the Chaos—A Study of Performance Variation and Predictability in Public IaaS Clouds , 2014, ACM Trans. Internet Techn..
[51] Jóakim von Kistowski,et al. SPEC CPU2017: Next-Generation Compute Benchmark , 2018, ICPE Companion.
[52] Pearl Brereton,et al. Performing systematic literature reviews in software engineering , 2006, ICSE.
[53] Dror G. Feitelson. Resampling with Feedback - A New Paradigm of Using Workload Data for Performance Evaluation , 2016, Euro-Par.
[54] Lieven Eeckhout,et al. Microarchitecture-Independent Workload Characterization , 2007, IEEE Micro.
[55] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.
[56] Lieven Eeckhout,et al. Statistically rigorous java performance evaluation , 2007, OOPSLA.
[57] Torsten Hoefler,et al. Scientific Benchmarking of Parallel Computing Systems Twelve ways to tell the masses when reporting performance results , 2017 .
[58] John R. Mashey,et al. War of the benchmark means: time for a truce , 2004, CARN.
[59] Klaus-Dieter Lange,et al. Identifying Shades of Green: The SPECpower Benchmarks , 2009, Computer.
[60] David J. Lilja,et al. Measuring computer performance : A practitioner's guide , 2000 .
[61] Andrew Lumsdaine,et al. The Value of Variance , 2016, ICPE.
[62] Samuel Kounev,et al. Chameleon: A Hybrid, Proactive Auto-Scaling Mechanism on a Level-Playing Field , 2019, IEEE Transactions on Parallel and Distributed Systems.
[63] Cees T. A. M. de Laat,et al. A Medium-Scale Distributed System for Computer Science Research: Infrastructure for the Long Term , 2016, Computer.
[64] Y. Zhang,et al. DataMill: a distributed heterogeneous infrastructure forrobust experimentation , 2016, Softw. Pract. Exp..
[65] Carsten Franke,et al. On Grid Performance Evaluation Using Synthetic Workloads , 2006, JSSPP.
[66] Johan Tordsson,et al. PEAS , 2016, ACM Trans. Model. Perform. Evaluation Comput. Syst..
[67] Jan Vitek,et al. R3: repeatability, reproducibility and rigor , 2012, SIGP.
[68] Nick McKeown,et al. Reproducible network experiments using container-based emulation , 2012, CoNEXT '12.
[69] David A W Soergel,et al. Rampant software errors may undermine scientific results , 2014, F1000Research.
[70] Christian P. Robert. Statistics Done Wrong: The Woefully Complete Guide , 2016 .
[71] David W. Flater,et al. The ghost in the machine: Don't let it haunt your software performance measurements , 2014 .
[72] S S Stevens,et al. On the Theory of Scales of Measurement. , 1946, Science.
[73] Robert N. M. Watson,et al. Queues Don't Matter When You Can JUMP Them! , 2015, NSDI.
[74] Christian S. Collberg,et al. Repeatability in computer systems research , 2016, Commun. ACM.
[75] Ronald F. Boisvert,et al. Incentivizing reproducibility , 2016, Commun. ACM.
[76] Matthias Hauswirth,et al. Producing wrong data without doing anything obviously wrong! , 2009, ASPLOS.
[77] Alexandru Iosup,et al. An Experimental Performance Evaluation of Autoscaling Policies for Complex Workflows , 2017, ICPE.
[78] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .
[79] D. Feitelson. Experimental Computer Science: the Need for a Cultural Change , 2006 .
[80] Erik Elmroth,et al. KPI-Agnostic Control for Fine-Grained Vertical Elasticity , 2017, 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID).
[81] Alexandru Iosup,et al. Benchmarking in the Cloud: What It Should, Can, and Cannot Be , 2012, TPCTC.
[82] Samuel Kounev,et al. BUNGEE: An Elasticity Benchmark for Self-Adaptive IaaS Cloud Environments , 2015, 2015 IEEE/ACM 10th International Symposium on Software Engineering for Adaptive and Self-Managing Systems.