Performance evaluation of applications for heterogeneous systems by means of performance probes
暂无分享,去创建一个
[1] Lieven Eeckhout,et al. Automated microprocessor stressmark generation , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[2] Lee Wang,et al. Data Parallel Algorithms , 1994 .
[3] Emilio Luque,et al. Tuning Application in a Multi-cluster Environment , 2006, Euro-Par.
[4] Myeong-Cheol Ko,et al. CPOC: Effective Static Task Scheduling for Grid Computing , 2005, HPCC.
[5] David H. Bailey. Unfavorable Strides in Cache Memory Systems (RNR Technical Report RNR-92-015) , 1995, Sci. Program..
[6] Brad Calder,et al. Discovering and Exploiting Program Phases , 2003, IEEE Micro.
[7] Zhiling Lan,et al. A fast restart mechanism for checkpoint/recovery protocols in networked environments , 2008, 2008 IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN).
[8] Emilio Luque,et al. Software probes: towards a quick method for machine characterization and application performance prediction , 2008, 2008 International Symposium on Parallel and Distributed Computing.
[9] Barton P. Miller,et al. Dynamic program instrumentation for scalable performance tools , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.
[10] Jeffrey K. Hollingsworth,et al. EMPS: an environment for memory performance studies , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[11] James E. Smith,et al. Comparing program phase detection techniques , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[12] Emilio Luque,et al. Parallel application signature , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.
[13] Sally A. McKee,et al. Using Dynamic Binary Instrumentation to Generate Multi-platform SimPoints: Methodology and Accuracy , 2008, HiPEAC.
[14] Erich Strohmaier,et al. Apex-Map: A Synthetic Scalable Benchmark Probe to Explore Data Access Performance on Highly Parallel Systems , 2005, Euro-Par.
[15] Brad Calder,et al. Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.
[16] Emilio Luque,et al. Software Probes: A Method for Quickly Characterizing Applications' Performance on Heterogeneous Environments , 2009, 2009 International Conference on Parallel Processing Workshops.
[17] Basilio B. Fraguela,et al. Precise automatable analytical modeling of the cache behavior of codes with indirections , 2007, TACO.
[18] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[19] Emilio Luque,et al. Improving Probe Usability , 2011, 2011 IEEE Workshops of International Conference on Advanced Information Networking and Applications.
[20] Leonid Oliker,et al. Identifying performance bottlenecks on modern microarchitectures using an adaptable probe , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..
[21] John L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.
[22] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.
[23] Reinhold Weicker,et al. Dhrystone: a synthetic systems programming benchmark , 1984, CACM.
[24] Erich Strohmaier,et al. APEX‐Map: a parameterized scalable memory access probe for high‐performance computing systems , 2007, Concurr. Comput. Pract. Exp..
[25] E. M. O. Junior,et al. Performance prediction and tuning in a multi-cluster environment , 2006 .
[26] Jason Duell,et al. Berkeley Lab Checkpoint/Restart (BLCR) for Linux Clusters , 2006 .
[27] Jesús Labarta,et al. Performance Modeling of HPC Applications , 2003, PARCO.
[28] Jaspal Subhlok,et al. Replicating memory behavior for performance prediction , 2004 .
[29] Lieven Eeckhout,et al. Microarchitecture-Independent Workload Characterization , 2007, IEEE Micro.
[30] Zhiling Lan,et al. FREM: A Fast Restart Mechanism for General Checkpoint/Restart , 2011, IEEE Transactions on Computers.
[31] Srinivas Aluru,et al. Practical Algorithms for Selection on Coarse-Grained Parallel Computers , 1997, IEEE Trans. Parallel Distributed Syst..
[32] Basilio B. Fraguela,et al. Analytical modeling of codes with arbitrary data-dependent conditional structures , 2006, J. Syst. Archit..
[33] Sally A. McKee,et al. Can hardware performance counters be trusted? , 2008, 2008 IEEE International Symposium on Workload Characterization.
[34] Brad Calder,et al. Motivation for Variable Length Intervals and Hierarchical Phase Behavior , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..
[35] Kevin Skadron,et al. Memory reference reuse latency: Accelerated warmup for sampled microarchitecture simulation , 2003, 2003 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2003..
[36] Emilio L. Zapata,et al. Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance , 2003, IEEE Trans. Computers.
[37] Erich Strohmaier,et al. Quantifying Locality In The Memory Access Patterns of HPC Applications , 2005, ACM/IEEE SC 2005 Conference (SC'05).
[38] Juan Touriño,et al. Automated and accurate cache behavior analysis for codes with irregular access patterns , 2007, Concurr. Comput. Pract. Exp..
[39] Jens Volkert,et al. Adaps - A three-phase adaptive prediction system for the run-time of jobs based on user behaviour , 2011, J. Comput. Syst. Sci..
[40] Janak H. Patel,et al. Accurate Low-Cost Methods for Performance Evaluation of Cache Memory Systems , 1988, IEEE Trans. Computers.
[41] Brad Calder,et al. SimPoint 3.0: Faster and More Flexible Program Phase Analysis , 2005, J. Instr. Level Parallelism.
[42] Peter M. Kogge,et al. On the Memory Access Patterns of Supercomputer Applications: Benchmark Selection and Its Implications , 2007, IEEE Transactions on Computers.
[43] Craig A. Lee,et al. Cluster performance and the implications for distributed, heterogeneous grid performance , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).
[44] Solomon W. Golomb,et al. Run-length encodings (Corresp.) , 1966, IEEE Trans. Inf. Theory.
[45] Salim Hariri,et al. Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..
[46] Lieven Eeckhout,et al. Distilling the essence of proprietary workloads into miniature benchmarks , 2008, TACO.
[47] Robert Kroeger,et al. A case study in top-down performance estimation for a large-scale parallel application , 2006, PPoPP '06.
[48] Herb Sutter,et al. The Free Lunch Is Over A Fundamental Turn Toward Concurrency in Software , 2013 .
[49] Adolfy Hoisie,et al. Scalability analysis of multidimensional wavefront algorithms on large-scale SMP clusters , 1999, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.
[50] Erich Schikuta,et al. Grid Workflow Optimization Regarding Dynamically Changing Resources and Conditions , 2007, GCC.
[51] Jack J. Dongarra,et al. The LINPACK Benchmark: An Explanation , 1988, ICS.
[52] Brad Calder,et al. Picking statistically valid and early simulation points , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.
[53] Jaspal Subhlok,et al. Skeleton based performance prediction on shared networks , 2004, IEEE International Symposium on Cluster Computing and the Grid, 2004. CCGrid 2004..
[54] George Karypis,et al. Introduction to Parallel Computing , 1994 .
[55] Patricia J. Teller,et al. Just how accurate are performance counters? , 2001, Conference Proceedings of the 2001 IEEE International Performance, Computing, and Communications Conference (Cat. No.01CH37210).
[56] Thomas Hérault,et al. MPICH-V Project: A Multiprotocol Automatic Fault-Tolerant MPI , 2006, Int. J. High Perform. Comput. Appl..
[57] Mohammed J. Zaki,et al. Compile-Time Scheduling Algorithms for a Heterogeneous Network of Workstations , 1997, Comput. J..
[58] Ophir Frieder,et al. Clustering and classification of large document bases in a parallel environment , 1997 .
[59] Jesús Labarta,et al. A Framework for Performance Modeling and Prediction , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[60] Daniel S. Katz,et al. A comparison of two methods for building astronomical image mosaics on a grid , 2005, 2005 International Conference on Parallel Processing Workshops (ICPPW'05).
[61] Paula Cecilia Fritzsche. Podemos predecir en algoritmos paralelos no deterministas , 2007 .
[62] Brad Calder,et al. The Strong correlation Between Code Signatures and Performance , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..
[63] Joseph Mohan,et al. Experience with Two Parallel Programs Solving the Traveling Salesman Problem , 1983, ICPP.
[64] D. Higgins,et al. T-Coffee: A novel method for fast and accurate multiple sequence alignment. , 2000, Journal of molecular biology.
[65] Jaspal Subhlok,et al. Automatic node selection for high performance applications on networks , 1999, PPoPP '99.
[66] Lizy K. John,et al. Performance prediction using program similarity , 2006 .
[67] Jorge G. Barbosa,et al. Linear algebra algorithms in a heterogeneous cluster of personal computers , 2000, Proceedings 9th Heterogeneous Computing Workshop (HCW 2000) (Cat. No.PR00556).
[68] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.
[69] Adolfy Hoisie,et al. Performance and Scalability Analysis of Teraflop-Scale Parallel Architectures Using Multidimensional Wavefront Applications , 2000, Int. J. High Perform. Comput. Appl..
[70] Jaspal Subhlok,et al. Automatic construction and evaluation of performance skeletons , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.
[71] Lieven Eeckhout,et al. Performance Evaluation and Benchmarking , 2005 .
[72] Qiang Xu,et al. Performance prediction with skeletons , 2008, Cluster Computing.
[73] Rajiv Kapoor,et al. Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[74] Brad Calder,et al. How to use SimPoint to pick simulation points , 2004, PERV.
[75] Brad Calder,et al. Structures for phase classification , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.
[76] Brad Calder,et al. Phase tracking and prediction , 2003, ISCA '03.
[77] Brian A. Wichmann,et al. A Synthetic Benchmark , 1976, Comput. J..