Just-in-time and just-in-place deadlock resolution

Deadlocked threads cannot make further progress, and frequently tie up resources requested by still other threads, causing more and more threads to come to a standstill. Thus, a deadlock should not remain undetected and uncorrected for a long time. If deadlock-detection processes are run too frequently, however, valuable system resources may be wasted. Therefore, it is important to choose the right interval between successive deadlock detections. Deadlock recovery must follow deadlock detection to release held resources in the cyclic wait. In addition to restarting the entire system, it is desirable that programmers be able to implement fine-grained recovery actions such as releasing a resource currently not in use. Such fine-grained recovery actions often require the knowledge of program contexts and deadlock states. Unfortunately, modern programming languages lack language-level support for signaling deadlock conditions and for structuring resolution code. My thesis is that, under the assumption that the time to the first deadlock in the system (after a system restart) follows an exponential distribution, a reinforcement-learning approach is effective in scheduling deadlock detection for a restart-oriented system, and that runtime exceptions are a programming abstraction that allows programmers to write fine-grained deadlock recovery code. My approach to deadlock-detection scheduling as reinforcement learning estimates the deadlock rate and then performs an optimization to find the detection interval that maximizes system utility. It is theoretically proved that this technique finds the best tradeoff, and experimental results suggest that it is a reasonable approximation to assume that the time to the first deadlock in the system (after a system restart) follows an exponential distribution. It is natural to consider deadlock occurrences as runtime exceptions because at runtime it is relatively easy to detect actual deadlock occurrences, which represent not only abnormal states but also fatal errors. Thus, exception handlers can be used to resolve deadlock occurrences based on deadlock states and program contexts. Furthermore, because exceptions are a widely used language concept, the technique of deadlock resolution via exceptions is intuitive and practical.

[1]  Edgar Knapp,et al.  Deadlock detection in distributed databases , 1987, CSUR.

[2]  Cho-Yu Jason Chiang,et al.  On Optimal Deadlock Detection Scheduling , 2006, IEEE Transactions on Computers.

[3]  John W. Young,et al.  A first order approximation to the optimum checkpoint interval , 1974, CACM.

[4]  Yi Deng,et al.  Specifying Software Architectural Connectors in SAM , 2000, Int. J. Softw. Eng. Knowl. Eng..

[5]  Ali Mili,et al.  Towards a Theory of Forward Error Recovery , 1985, IEEE Transactions on Software Engineering.

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  David J. DeWitt,et al.  Deadlock detection is cheap , 1983, SGMD.

[8]  Jim Gray,et al.  Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[9]  Cyrille Artho Finding faults in multi-threaded programs , 2001 .

[10]  Sheng Liang,et al.  Comprehensive Profiling Support in the Java Virtual Machine , 1999, COOTS.

[11]  C. H. Oh,et al.  Some comments on , 1998 .

[12]  K. Mani Chandy,et al.  A Survey of Analytic Models of Rollback and Recovery Stratergies , 1975, Computer.

[13]  Erol Gelenbe,et al.  Optimum checkpoints with age dependent failures , 2004, Acta Informatica.

[14]  Lu Jianfeng,et al.  Agent language NUML and its reduction implementation model based on HOπ , 1994 .

[15]  Manwu Xu,et al.  Agent language NUML and its reduction implementation model based on HOπ , 1994, SIGP.

[16]  Doug Lea,et al.  Concurrent Programming In Java , 1996 .

[17]  Toshio Nakagawa,et al.  Approximate Calculation of Optimal Inspection Times , 1980 .

[18]  Suresh Jagannathan,et al.  Preemption-based avoidance of priority inversion for Java , 2004, International Conference on Parallel Processing, 2004. ICPP 2004..

[19]  Doug Lea,et al.  Concurrent programming in Java - design principles and patterns , 1996, Java series.

[20]  Alan Burns,et al.  Concurrent programming , 1980, Operating Systems Engineering.

[21]  Valérie Issarny,et al.  Coordinated forward error recovery for composite Web services , 2003, 22nd International Symposium on Reliable Distributed Systems, 2003. Proceedings..

[22]  Frederick P. Brooks,et al.  No Silver Bullet: Essence and Accidents of Software Engineering , 1987 .

[23]  John F. Meyer Performability evaluation: where it is and what lies ahead , 1995, Proceedings of 1995 IEEE International Computer Performance and Dependability Symposium.

[24]  Richard C. Holt,et al.  Some deadlock properties of computer systems , 1971, SOSP '71.

[25]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[26]  Hector Garcia-Molina,et al.  Effective page refresh policies for Web crawlers , 2003, TODS.

[27]  José Rodrigues Dias,et al.  New approximate solutions per unit of time for periodically checked systems with different lifetime distributions , 2006, Adv. Decis. Sci..

[28]  K. E. Murphy,et al.  The exponential distribution: the good, the bad and the ugly. A practical guide to its implementation , 2002, Annual Reliability and Maintainability Symposium. 2002 Proceedings (Cat. No.02CH37318).

[29]  Erik R. Altman,et al.  LaTTe: a Java VM just-in-time compiler with fast and efficient register allocation , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[30]  Edsger W. Dijkstra,et al.  Co-operating sequential processes , 1968 .

[31]  Gertrude Neuman Levine,et al.  The classification of deadlock prevention and avoidance is erroneous , 2005, OPSR.

[32]  Xiaola Lin,et al.  A Variational Calculus Approach to Optimal Checkpoint Placement , 2001, IEEE Trans. Computers.

[33]  Gertrude Neuman Levine,et al.  Defining deadlock , 2003, OPSR.

[34]  Ing-Ray Chen,et al.  Stochastic Petri Net Analysis of Deadlock Detection Algorithms in Transaction Database Systems with Dynamic Locking , 1995, Comput. J..

[35]  John B. Goodenough,et al.  Exception handling: issues and a proposed notation , 1975, CACM.

[36]  H. Luss,et al.  Inspection Policies When Duration of Checkings is Non-Negligible , 1974 .

[37]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[38]  K. Koch Introduction to Bayesian Statistics , 2007 .

[39]  Anne Condon,et al.  On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.

[40]  Butler W. Lampson,et al.  Experience with processes and monitors in Mesa , 1980, CACM.

[41]  Fancong Zeng Deadlock resolution via exceptions for dependable Java applications , 2003 .

[42]  W YoungJohn A first order approximation to the optimum checkpoint interval , 1974 .

[43]  Greg Nelson,et al.  Extended static checking for Java , 2002, PLDI '02.

[44]  Katinka Wolter,et al.  Analysis of Restart Mechanisms in Software Systems , 2006, IEEE Transactions on Software Engineering.

[45]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[46]  Richard E. Barlow,et al.  Optimum Checking Procedures , 1963 .

[47]  W. N. Chin Some comments on "Deadlock detection is cheap" in SIGMOD record Jan 83 , 1983, SGMD.

[48]  Edsger W. Dijkstra,et al.  Hierarchical ordering of sequential processes , 1971, Acta Informatica.

[49]  Michael D. Ernst,et al.  Static Deadlock Detection for Java Libraries , 2005, ECOOP.

[50]  George Candea,et al.  Recursive restartability: turning the reboot sledgehammer into a scalpel , 2001, Proceedings Eighth Workshop on Hot Topics in Operating Systems.

[51]  Flaviu Cristian,et al.  Exception Handling and Software Fault Tolerance , 1982, IEEE Transactions on Computers.

[52]  Margaret M. Burnett,et al.  The impact of software engineering research on modern programming languages , 2005, ACM Trans. Softw. Eng. Methodol..

[53]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[54]  Arie Shoshani,et al.  System Deadlocks , 1971, CSUR.

[55]  Philip Koopman,et al.  Elements of the Self-Healing System Problem Space , 2003 .