An Automatic Synthesizer of Advising Tools for High Performance Computing

This article presents Egeria, the first automatic synthesizer of advising tools for High-Performance Computing (HPC). When one provides it with some HPC programming guides as inputs, Egeria automatically constructs a text retrieval tool that can advise on what to do to improve the performance of a given program. The advising tool provides a concise list of essential rules automatically extracted from the documents and can retrieve relevant optimization knowledge for optimization questions. Egeria is built based on a distinctive multi-layered design that leverages natural language processing (NLP) techniques and extends them with HPC-specific knowledge and considerations. This article presents the design, implementation, and both quantitative and qualitative evaluation results of Egeria.

[1]  Xiaohui Gu,et al.  Ieee Transactions on Parallel and Distributed Systems (tpds) Perfcompass: Online Performance Anomaly Fault Localization and Inference in Infrastructure-as-a-service Clouds , 2022 .

[2]  Joakim Nivre,et al.  Dependency Parsing , 2009, Lang. Linguistics Compass.

[3]  Timothy G. Mattson,et al.  OpenCL Programming Guide , 2011 .

[4]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Noah A. Smith,et al.  Dependency Parsing , 2009, Encyclopedia of Artificial Intelligence.

[7]  Luke S. Zettlemoyer,et al.  AllenNLP: A Deep Semantic Natural Language Processing Platform , 2018, ArXiv.

[8]  Nathan R. Tallent,et al.  HPCTOOLKIT: tools for performance analysis of optimized parallel programs , 2010, Concurr. Comput. Pract. Exp..

[9]  Roger B. Bradford,et al.  An empirical study of required dimensionality for large-scale latent semantic indexing applications , 2008, CIKM '08.

[10]  Dan Roth,et al.  The Importance of Syntactic Parsing and Inference in Semantic Role Labeling , 2008, CL.

[11]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[12]  André Freitas,et al.  A Survey on Open Information Extraction , 2018, COLING.

[13]  Senthil Mani,et al.  AUSUM: approach for unsupervised bug report summarization , 2012, SIGSOFT FSE.

[14]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15]  John M. Mellor-Crummey,et al.  Pinpointing data locality problems using data-centric analysis , 2011, International Symposium on Code Generation and Optimization (CGO 2011).

[16]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[17]  Christopher D. Manning,et al.  Stanford typed dependencies manual , 2010 .

[18]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[19]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[20]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[21]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[22]  Collin McCurdy,et al.  Memphis: Finding and fixing NUMA-related performance problems on multi-core platforms , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[24]  Xavier Carreras,et al.  Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling , 2005, CoNLL.

[25]  Dipanjan Das Andr,et al.  A Survey on Automatic Text Summarization , 2007 .

[26]  Hamid Krim,et al.  Egeria: a framework for automatic synthesis of HPC advising tools through multi-layered natural language processing , 2017, SC.

[27]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[28]  Jeffrey K. Hollingsworth,et al.  Data Centric Cache Measurement on the Intel ltanium 2 Processor , 2004, Proceedings of the ACM/IEEE SC2004 Conference.

[29]  Christopher D. Manning,et al.  Stanza: A Python Natural Language Processing Toolkit for Many Human Languages , 2020, ACL.

[30]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[31]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[32]  Vivien Quéma,et al.  MemProf: A Memory Profiler for NUMA Multicore Systems , 2012, USENIX Annual Technical Conference.

[33]  Susan L. Graham,et al.  Gprof: A call graph execution profiler , 1982, SIGPLAN '82.

[34]  Santosh S. Vempala,et al.  Latent semantic indexing: a probabilistic analysis , 1998, PODS '98.

[35]  David Lo,et al.  Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction , 2012, 2012 19th Working Conference on Reverse Engineering.

[36]  Razvan C. Bunescu,et al.  Learning to rank relevant files for bug reports using domain knowledge , 2014, SIGSOFT FSE.

[37]  Yu Zhou,et al.  Combining Text Mining and Data Mining for Bug Report Classification , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.