论文信息 - An Automatic Synthesizer of Advising Tools for High Performance Computing

An Automatic Synthesizer of Advising Tools for High Performance Computing

This article presents Egeria, the first automatic synthesizer of advising tools for High-Performance Computing (HPC). When one provides it with some HPC programming guides as inputs, Egeria automatically constructs a text retrieval tool that can advise on what to do to improve the performance of a given program. The advising tool provides a concise list of essential rules automatically extracted from the documents and can retrieve relevant optimization knowledge for optimization questions. Egeria is built based on a distinctive multi-layered design that leverages natural language processing (NLP) techniques and extends them with HPC-specific knowledge and considerations. This article presents the design, implementation, and both quantitative and qualitative evaluation results of Egeria.

[1] Xiaohui Gu,et al. Ieee Transactions on Parallel and Distributed Systems (tpds) Perfcompass: Online Performance Anomaly Fault Localization and Inference in Infrastructure-as-a-service Clouds , 2022 .

[2] Joakim Nivre,et al. Dependency Parsing , 2009, Lang. Linguistics Compass.

[3] Timothy G. Mattson,et al. OpenCL Programming Guide , 2011 .

[4] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .

[5] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6] Noah A. Smith,et al. Dependency Parsing , 2009, Encyclopedia of Artificial Intelligence.

[7] Luke S. Zettlemoyer,et al. AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[8] Nathan R. Tallent,et al. HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[9] Roger B. Bradford,et al. An empirical study of required dimensionality for large-scale latent semantic indexing applications , 2008, CIKM '08.

[10] Dan Roth,et al. The Importance of Syntactic Parsing and Inference in Semantic Role Labeling , 2008, CL.

[11] Petr Sojka,et al. Software Framework for Topic Modelling with Large Corpora , 2010 .

[12] André Freitas,et al. A Survey on Open Information Extraction , 2018, COLING.

[13] Senthil Mani,et al. AUSUM: approach for unsupervised bug report summarization , 2012, SIGSOFT FSE.

[14] Jeffrey Dean,et al. Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15] John M. Mellor-Crummey,et al. Pinpointing data locality problems using data-centric analysis , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[16] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.

[17] Christopher D. Manning,et al. Stanford typed dependencies manual , 2010 .

[18] Daniel Gildea,et al. The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[19] Matt J. Kusner,et al. From Word Embeddings To Document Distances , 2015, ICML.

[20] Patrick Pantel,et al. From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[21] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[22] Collin McCurdy,et al. Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[23] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24] Xavier Carreras,et al. Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[25] Dipanjan Das Andr,et al. A Survey on Automatic Text Summarization , 2007 .

[26] Hamid Krim,et al. Egeria: a framework for automatic synthesis of HPC advising tools through multi-layered natural language processing , 2017, SC.

[27] Heikki Mannila,et al. Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[28] Jeffrey K. Hollingsworth,et al. Data Centric Cache Measurement on the Intel ltanium 2 Processor , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[29] Christopher D. Manning,et al. Stanza: A Python Natural Language Processing Toolkit for Many Human Languages , 2020, ACL.

[30] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[31] Geoffrey Zweig,et al. Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[32] Vivien Quéma,et al. MemProf: A Memory Profiler for NUMA Multicore Systems , 2012, USENIX Annual Technical Conference.

[33] Susan L. Graham,et al. Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[34] Santosh S. Vempala,et al. Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[35] David Lo,et al. Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction , 2012, 2012 19th Working Conference on Reverse Engineering.

[36] Razvan C. Bunescu,et al. Learning to rank relevant files for bug reports using domain knowledge , 2014, SIGSOFT FSE.

[37] Yu Zhou,et al. Combining Text Mining and Data Mining for Bug Report Classification , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.