Parallel Performance Problems on Shared-Memory Multicore Systems: Taxonomy and Observation
暂无分享,去创建一个
[1] A. Roberts. Multi-Core Programming Increasing Performance through Software Multi-threading Shameem , 2006 .
[2] Dawson R. Engler,et al. RacerX: effective, static detection of race conditions and deadlocks , 2003, SOSP '03.
[3] Andrew Begel,et al. Analyze this! 145 questions for data scientists in software engineering , 2013, ICSE.
[4] Josep Torrellas,et al. False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.
[5] Rachel K. E. Bellamy,et al. How Programmers Debug, Revisited: An Information Foraging Theory Perspective , 2013, IEEE Transactions on Software Engineering.
[6] Matthias Hauswirth,et al. Evaluating the accuracy of Java profilers , 2010, PLDI '10.
[7] Alexandra Fedorova,et al. Addressing shared resource contention in multicore processors via scheduling , 2010, ASPLOS XV.
[8] Claudia Fohry,et al. Common Mistakes in OpenMP and How to Avoid Them - A Collection of Best Practices , 2005, IWOMP.
[9] Mahmut T. Kandemir,et al. Studying inter-core data reuse in multicores , 2011, SIGMETRICS '11.
[10] G. Amdhal,et al. Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).
[11] Anand Sivasubramaniam,et al. Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks , 2002, SIGMETRICS '02.
[12] Ahmed E. Hassan,et al. A qualitative study on performance bugs , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).
[13] Nihar R. Mahapatra,et al. The processor-memory bottleneck: problems and solutions , 1999, CROS.
[14] Nathan R. Tallent,et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..
[15] Koen De Bosschere. Upcoming Computing System Challenges - The HiPEAC Vision (Anstehende Herausforderungen der Computer Industrie - Die HiPEAC Vision) , 2008, it Inf. Technol..
[16] Clay P. Breshears. The Art of Concurrency - A Thread Monkey's Guide to Writing Parallel Applications , 2009 .
[17] Jonathan Walpole,et al. Is Parallel Programming Hard, And If So, Why? , 2009 .
[18] Thomas E. Anderson,et al. The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors , 1990, IEEE Trans. Parallel Distributed Syst..
[19] John A. Fotheringham,et al. Dynamic storage allocation in the Atlas computer, including an automatic use of a backing store , 1961, Commun. ACM.
[20] Allen D. Malony,et al. PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[21] R. Newton,et al. Capturing and Composing Parallel Patterns with Intel CnC Ryan Newton Frank Schlimbach Mark Hampton Kathleen Knobe Intel , 2010 .
[22] Allen D. Malony,et al. The Tau Parallel Performance System , 2006, Int. J. High Perform. Comput. Appl..
[23] Ravi Rajwar,et al. Speculative lock elision: enabling highly concurrent multithreaded execution , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.
[24] Shan Lu,et al. Toddler: Detecting performance problems via similar memory-access patterns , 2013, 2013 35th International Conference on Software Engineering (ICSE).
[25] Steven A. Hofmeyr,et al. Oversubscription on multicore processors , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[26] Lars Koesterke,et al. PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[27] Mordechai Ben-Ari,et al. Principles of concurrent programming , 1982 .
[28] Thomas M. Conte,et al. Embedded Multicore Processors and Systems , 2009, IEEE Micro.
[29] David A. Patterson,et al. Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .
[30] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[31] Sigrid Eldh. Software Testing Techniques , 2007 .
[32] Guru Venkataramani,et al. DeFT: Design space exploration for on-the-fly detection of coherence misses , 2011, TACO.
[33] Wenli Zhang,et al. HaLock: Hardware-assisted lock contention detection in multithreaded applications , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[34] Lieven Eeckhout,et al. Undersubscribed threading on clustered cache architectures , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[35] Koen De Bosschere,et al. The Hipeac Vision, 2010 , 2010 .
[36] Caitlin Sadowski,et al. The last mile: parallel programming and usability , 2010, FoSER '10.
[37] Boris Beizer,et al. Software testing techniques (2. ed.) , 1990 .
[38] Emerson R. Murphy-Hill,et al. Cowboys, ankle sprains, and keepers of quality: how is video game development different from software development? , 2014, ICSE.
[39] Michael Wolfe,et al. Data dependence and its application to parallel processing , 2005, International Journal of Parallel Programming.
[40] David Gregg,et al. Design considerations for parallel performance tools , 2014, CHI.
[41] Dongmei Zhang,et al. Performance debugging in the large via mining millions of stack traces , 2012, 2012 34th International Conference on Software Engineering (ICSE).
[42] Janak H. Patel,et al. A low-overhead coherence solution for multiprocessors with private cache memories , 1984, ISCA '84.
[43] Klaas-Jan Stol,et al. Two's company, three's a crowd: a case study of crowdsourcing software development , 2014, ICSE.
[44] Hans-Wolfgang Loidl,et al. Algorithm + strategy = parallelism , 1998, Journal of Functional Programming.
[45] Murray Cole,et al. Algorithmic Skeletons: Structured Management of Parallel Computation , 1989 .
[46] Ulrich Drepper,et al. What Every Programmer Should Know About Memory , 2007 .
[47] Nathan R. Tallent,et al. Analyzing lock contention in multithreaded applications , 2010, PPoPP '10.
[48] Matt Bishop,et al. Checking for Race Conditions in File Accesses , 1996, Comput. Syst..
[49] Michael T. Heath,et al. Visualizing the performance of parallel programs , 1991, IEEE Software.
[50] Bruno R. Preiss,et al. Architectural Skeletons: The Re-Usable Building-Blocks for Parallel Applications , 1999, PDPTA.
[51] James Demmel,et al. the Parallel Computing Landscape , 2022 .
[52] Thomas Fritz,et al. Using information fragments to answer the questions developers ask , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.
[53] Robert J. Fowler,et al. NUMA policies and their relation to memory architecture , 1991, ASPLOS IV.
[54] Xiaoyan Zhu,et al. Does bug prediction support human developers? Findings from a Google case study , 2013, 2013 35th International Conference on Software Engineering (ICSE).
[55] Brad A. Myers,et al. An Exploratory Study of How Developers Seek, Relate, and Collect Relevant Information during Software Maintenance Tasks , 2006, IEEE Transactions on Software Engineering.
[56] Lawrence Snyder,et al. Poker on the Cosmic Cube: The First Retargetable Parallel Programming Language and Environment , 1986, ICPP.
[57] David R. O'Hallaron,et al. Computer Systems: A Programmer's Perspective , 1991 .
[58] Thomas Fahringer. Automatic Performance Prediction of Parallel Programs , 1996, Springer US.
[59] Rachel K. E. Bellamy,et al. The whats and hows of programmers' foraging diets , 2013, CHI.
[60] Robert A. van de Geijn,et al. Anatomy of high-performance matrix multiplication , 2008, TOMS.
[61] Ajit Singh,et al. Design Patterns for Parallel Programming , 1996, PDPTA.
[62] Sally A. McKee,et al. An Approach to Performance Prediction for Parallel Applications , 2005, Euro-Par.
[63] Yan Solihin,et al. Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[64] Rohit Chandra,et al. Parallel programming in openMP , 2000 .
[65] A. Viera,et al. Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.
[66] José G. Castaños,et al. Eliminating global interpreter locks in ruby through hardware transactional memory , 2014, PPoPP '14.
[67] Nathan Clark,et al. Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications , 2010, ISCA.
[68] Alan Mycroft,et al. Limits of parallelism using dynamic dependency graphs , 2009, WODA '09.
[69] J. R. Landis,et al. The measurement of observer agreement for categorical data. , 1977, Biometrics.
[70] David M. Nicol,et al. Performance prediction of a parallel simulator , 1999, Proceedings Thirteenth Workshop on Parallel and Distributed Simulation. PADS 99. (Cat. No.PR00155).
[71] Thomas L. Casavant. Tools and Methods for Visualization of Parallel Systems and Computations - Guest Editor's Introduction , 1993, J. Parallel Distributed Comput..
[72] Jim Gray,et al. The convoy phenomenon , 1979, OPSR.
[73] James Reinders,et al. Intel threading building blocks - outfitting C++ for multi-core processor parallelism , 2007 .
[74] Babak Falsafi,et al. The HiPEAC Vision , 2010 .
[75] Matthias Hauswirth,et al. Catch me if you can: performance bug detection in the wild , 2011, OOPSLA '11.
[76] Ahmed E. Hassan,et al. Detecting performance anti-patterns for applications developed using object-relational mapping , 2014, ICSE.
[77] Maurice Herlihy,et al. The art of multiprocessor programming , 2020, PODC '06.
[78] Raj Jain,et al. The art of computer systems performance analysis - techniques for experimental design, measurement, simulation, and modeling , 1991, Wiley professional computing.
[79] Stuart K. Card,et al. Information foraging in information access environments , 1995, CHI '95.
[80] Peter Hinz,et al. Visualizing the performance of parallel programs , 1996 .
[81] Marin Litoiu,et al. A performance evaluation framework for Web applications , 2013, J. Softw. Evol. Process..
[82] Barton P. Miller,et al. What are race conditions?: Some issues and formalizations , 1992, LOPL.
[83] Yiannakis Sazeides,et al. Performance implications of single thread migration on a chip multi-core , 2005, CARN.
[84] David Detlefs,et al. Eliminating synchronization-related atomic operations with biased locking and bulk rebiasing , 2006, OOPSLA '06.
[85] L. Snyder,et al. Parallel Programming and the Poker Programming Environment , 1984, Computer.
[86] Gunter Saake,et al. Predicting performance via automated feature-interaction detection , 2012, 2012 34th International Conference on Software Engineering (ICSE).
[87] Timothy G. Mattson,et al. Parallel programming: Can we PLEASE get it right this time? , 2008, 2008 45th ACM/IEEE Design Automation Conference.
[88] Frank Mueller,et al. Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[89] Nick Mitchell,et al. Visualizing the Execution of Java Programs , 2001, Software Visualization.
[90] Ying Zou,et al. An Industrial Case Study on the Automated Detection of Performance Regressions in Heterogeneous Environments , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.
[91] Barton P. Miller,et al. The Paradyn Parallel Performance Measurement Tool , 1995, Computer.
[92] Ken Kennedy,et al. Optimizing for parallelism and data locality , 1992 .