Automatically detecting and fixing concurrency bugs in go software systems

Go is a statically typed programming language designed for efficient and reliable concurrent programming. For this purpose, Go provides lightweight goroutines and recommends passing messages using channels as a less error-prone means of thread communication. Go has become increasingly popular in recent years and has been adopted to build many important infrastructure software systems. However, a recent empirical study shows that concurrency bugs, especially those due to misuse of channels, exist widely in Go. These bugs severely hurt the reliability of Go concurrent systems. To fight Go concurrency bugs caused by misuse of channels, this paper proposes a static concurrency bug detection system, GCatch, and an automated concurrency bug fixing system, GFix. After disentangling an input Go program, GCatch models the complex channel operations in Go using a novel constraint system and applies a constraint solver to identify blocking bugs. GFix automatically patches blocking bugs detected by GCatch using Go’s channel-related language features. We apply GCatch and GFix to 21 popular Go applications, including Docker, Kubernetes, and gRPC. In total, GCatch finds 149 previously unknown blocking bugs due to misuse of channels and GFix successfully fixes 124 of them. We have reported all detected bugs and generated patches to developers. So far, developers have fixed 125 blocking misuse-of-channel bugs based on our reporting. Among them, 87 bugs are fixed by applying GFix’s patches directly.

[1]  Martin Schulz,et al.  MPI runtime error detection with MUST: Advances in deadlock detection , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[2]  Scott A. Mahlke,et al.  The theory of deadlock avoidance via discrete control , 2009, POPL '09.

[3]  Stephen N. Freund,et al.  Atomizer: a dynamic atomicity checker for multithreaded programs , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[4]  Nicolas Dilley,et al.  An Empirical Study of Messaging Passing Concurrency in Go Projects , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[5]  Yiying Zhang,et al.  Understanding Real-World Concurrency Bugs in Go , 2019, ASPLOS.

[6]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[7]  George Candea,et al.  RaceMob: crowdsourced data race detection , 2013, SOSP.

[8]  Shan Lu,et al.  ConMem: detecting severe concurrency bugs through an effect-oriented approach , 2010, ASPLOS XV.

[9]  C. A. R. Hoare,et al.  Communicating sequential processes , 1978, CACM.

[10]  Horatiu Jula,et al.  Deadlock Immunity: Enabling Systems to Defend Against Deadlocks , 2008, OSDI.

[11]  W. K. Chan,et al.  ConLock: a constraint-based approach to dynamic checking on deadlocks in multithreaded programs , 2014, ICSE.

[12]  Bernardo Toninho,et al.  Fencing off go: liveness and safety for channel-based programming , 2016, POPL.

[13]  Brandon Lucia,et al.  Finding concurrency bugs with context-aware communication graphs , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[14]  Rohan Padhye,et al.  Efficient scalable thread-safety-violation detection: finding thousands of concurrency bugs during testing , 2019, SOSP.

[15]  Dawson R. Engler,et al.  RacerX: effective, static detection of race conditions and deadlocks , 2003, SOSP '03.

[16]  Sebastian Burckhardt,et al.  Effective Data-Race Detection for the Kernel , 2010, OSDI.

[17]  Chao Wang,et al.  Trace-Based Symbolic Analysis for Atomicity Violations , 2010, TACAS.

[18]  Yuanyuan Zhou,et al.  Learning from mistakes: a comprehensive study on real world concurrency bug characteristics , 2008, ASPLOS.

[19]  Jeff Huang,et al.  Execution privatization for scheduler-oblivious concurrent programs , 2012, OOPSLA '12.

[20]  Jeff Huang,et al.  Precise and maximal race detection from incomplete traces , 2016, OOPSLA.

[21]  Martin Schulz,et al.  A graph based approach for MPI deadlock detection , 2009, ICS '09.

[22]  Premkumar T. Devanbu,et al.  A large scale study of programming languages and code quality in github , 2014, SIGSOFT FSE.

[23]  Jeff Huang,et al.  Stateless model checking concurrent programs with maximal causality reduction , 2015, PLDI.

[24]  Michael Burrows,et al.  Eraser: a dynamic data race detector for multi-threaded programs , 1997, TOCS.

[25]  Shan Lu,et al.  Understanding and detecting real-world performance bugs , 2012, PLDI.

[26]  Shan Lu,et al.  Understanding and generating high quality patches for concurrency bugs , 2016, SIGSOFT FSE.

[27]  Yun Zhang,et al.  Static data race detection for concurrent programs with asynchronous calls , 2009, ESEC/FSE '09.

[28]  Alexander Aiken,et al.  Effective static race detection for Java , 2006, PLDI '06.

[29]  Ganesh Gopalakrishnan,et al.  ISP: a tool for model checking MPI programs , 2008, PPOPP.

[30]  Shan Lu,et al.  Automated atomicity-violation fixing , 2011, PLDI '11.

[31]  Daniel Kroening,et al.  Sound static deadlock analysis for C/Pthreads , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[32]  Carl Hewitt,et al.  The Actor Model , 2010 .

[33]  Chao Wang,et al.  ConcBugAssist: constraint solving for diagnosis and repair of concurrency bugs , 2015, ISSTA.

[34]  Tong Zhang,et al.  ProRace: Practical Data Race Detection for Production Use , 2017, ASPLOS.

[35]  Jakob Rehof,et al.  Static Deadlock Prevention in Dynamically Configured Communication Networks , 2008 .

[36]  Grigore Rosu,et al.  Maximal sound predictive race detection with control flow abstraction , 2014, PLDI.

[37]  Junfeng Yang,et al.  Verifying systems rules using rule-directed symbolic execution , 2013, ASPLOS '13.

[38]  Alexander Aiken,et al.  Conditional must not aliasing for static race detection , 2007, POPL '07.

[39]  George Candea,et al.  Data races vs. data race bugs: telling the difference with portend , 2012, ASPLOS XVII.

[40]  Satish Narayanasamy,et al.  A case for an interleaving constrained shared-memory multi-processor , 2009, ISCA '09.

[41]  Jeff Huang,et al.  CLAP: recording local executions to reproduce concurrency failures , 2013, PLDI.

[42]  Daniel Kroening,et al.  Precise Predictive Analysis for Discovering Communication Deadlocks in MPI Programs , 2014, FM.

[43]  Bernardo Toninho,et al.  A Static Verification Framework for Message Passing in Go Using Behavioural Types , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[44]  Koushik Sen,et al.  Race directed random testing of concurrent programs , 2008, PLDI '08.

[45]  Nobuko Yoshida,et al.  Static deadlock detection for concurrent go by global session graph synthesis , 2016, CC.

[46]  W. K. Chan,et al.  MagicFuzzer: Scalable deadlock detection for large-scale applications , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[47]  Yuanyuan Zhou,et al.  CTrigger: exposing atomicity violation bugs from their hiding places , 2009, ASPLOS.

[48]  Kedar S. Namjoshi Are Concurrent Programs That Are Easier to Write Also Easier to Check? , 2008 .

[49]  Nobuko Yoshida,et al.  Static Race Detection and Mutex Safety and Liveness for Go Programs (Artifact) , 2020, Dagstuhl Artifacts Ser..

[50]  Danny Dig,et al.  Practical static race detection for Java parallel loops , 2013, ISSTA.

[51]  Vivek K. Shanbhag Deadlock-Detection in Java-Library Using Static-Analysis , 2008, 2008 15th Asia-Pacific Software Engineering Conference.

[52]  Sorin Lerner,et al.  RELAY: static race detection on millions of lines of code , 2007, ESEC-FSE '07.

[53]  Francesco Sorrentino,et al.  PENELOPE: weaving threads to expose atomicity violations , 2010, FSE '10.

[54]  Nobuko Yoshida,et al.  Verifying message-passing programs with dependent behavioural types , 2019, PLDI.

[55]  Shan Lu,et al.  DFix: automatically fixing timing bugs in distributed systems , 2019, PLDI.

[56]  Jeff Huang,et al.  Persuasive prediction of concurrency access anomalies , 2011, ISSTA '11.

[57]  Qi Gao,et al.  2ndStrike: toward manifesting hidden concurrency typestate bugs , 2011, ASPLOS XVI.

[58]  D. Engler,et al.  Using meta-level compilation to check FLASH protocol code , 2000, ASPLOS IX.

[59]  Salvatore La Torre,et al.  Lazy-CSeq: A Context-Bounded Model Checking Tool for Multi-threaded C-Programs , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[60]  Ganesh Gopalakrishnan,et al.  A Sound Reduction of Persistent-Sets for Deadlock Detection in MPI Applications , 2012, SBMF.

[61]  David Lie,et al.  Kivati: fast detection and prevention of atomicity violations , 2010, EuroSys '10.

[62]  Nancy G. Leveson,et al.  An investigation of the Therac-25 accidents , 1993, Computer.

[63]  Wei Zhang,et al.  Automated Concurrency-Bug Fixing , 2012, OSDI.

[64]  Jeffrey S. Foster,et al.  LOCKSMITH: context-sensitive correlation analysis for race detection , 2006, PLDI '06.

[65]  Yuanyuan Zhou,et al.  AVIO: Detecting Atomicity Violations via Access-Interleaving Invariants , 2007, IEEE Micro.

[66]  Zhenmin Li,et al.  PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code , 2005, ESEC/FSE-13.

[67]  Scott A. Mahlke,et al.  Gadara: Dynamic Deadlock Avoidance for Multithreaded Programs , 2008, OSDI.

[68]  Hongyu Liu,et al.  UNDEAD: Detecting and preventing deadlocks in production software , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).