How to investigate deadlock issues
- Collect javacores;
- Look for deadlock messages
- Look for threads in waiting status
- Look for threads owns monitor lock
- Compare threads and compare monitors
Step 1: Collect javacores
For a typical hung- possible caused by a deadlock- the collection should take at least 3 dumps (javacore) with 5 minutes interval between one and the next one. The best way to get mentioned data is follow the instructions of WAS MustGather tech document. For example, WAS hung issue on AIX:
Step 2: Look for deadlock messages
Open the javacores with any text file word processor. Look for the string "Deadlock detected".
- If the warning message appears you should review the code reported by the Thread's stack trace
- If not, it means the built-in hung thread detection monitor did not find any deadlock; in other words it does not mean a deadlock is not happening but it means it has not been automatically detected. So, the investigation should go on with the step 3.
Step 3: Look for threads in waiting status
Open the javacores with the IBM tool "Thread and Monitor Dump Analyzer". Find the Thread Status Analysis in the report generated for the 1st javacore:
keep particular attention on the number of "Waiting on condition" and "Blocked" threads. At this point it will be useful review the details of the waiting/blocked threads looking their stack trace in order to understand what and why their are waiting. Point of investigation are:
- custom code involved;
- external resources threads could wait on;
- threads status during different javacores;
- lock data.
The lock information is the key in the quest of finding a deadlock. The lock is a resource that can only be owned by one thread at a time. Other threads waiting for that lock are blocked until the thread that owns it releases it.
Step 4: Look for threads owns monitor lock
Thread Monitor and dump analyzer shows the threads are keeping the lock on resources. They are identified with special icons:
As you can see the icon is the Monitor image just beside the name of the Thread. As soon as you select the thread you can see what it's locking.
So, for example this Thread owns Monitor Lock on com/ibm/ws/util/BoundedBuffer$GetQueueLock@0x0...
The investigation should continue looking for threads are locking Monitor for long time and if there is a relevant number of threads waiting/or blocked by these threads.
Step 5: compare Threads and Monitors
A possible further step in the investigation is compare the threads and the monitors during different javacores. Thread Monitor and dump analyzer could help to accomplish this task without any effort but just selecting the option "Analysis > Compare threads" and "Analysis > Compare monitors" (after the selection of the javacores collected).
Compare threads with Thread Monitor and dump analyzer
This feature offers a view of the threads status during the time reported by the javacore collected (that's why it's important collect more javacores!!!). The best way to describe this feature is show some example:
Above image shows 4 running threads and their status during 4 different javacores (the columns). The border indicate the status (green is Running) and the background color (in this case red) indicate a suspect thread. Different conclusions can be done looking- for example- above image: in fact, the WebContainer is blocked in the first two javacore but than just waiting on condition in the next two javacore for different tasks (you can understand the tasks were different looking at the stack trace of the thread reported by Thread Monitor and dump analyzer on the right side of thewindow); since the task requested by the thread is different (in different javacore) this thread (or better pool) should not be a suspected thread. So the analysis should be on looking for threads (in particular WebContainer) in wait status or blocked from the first javacore till the last one requesting the same operation.
Compare monitors with Thread Monitor and dump analyzer
At this point the investigation could continue or have confirmation on what we have found previously using the comparison monitors feature:
The above image shows as the tool detects a suspect Thread- DRSTHreadPool: DMN0- locks specific Monitor for the time of the last three javacores. In the Waiting Threads tab there is the list of threads in waiting for current thread to unblock the lock. So, this Thread could be a good candidate to investigate on looking at the stack trace and try to understand what it's doing and why it does not release the lock for the time of the last 3 javacores.
The detection of an hung or deadlock issue is unfortunately a task could requires time to investigate on. The most important things are:
- detect the suspect threads (using the steps reported above);
- investigate what those threads are doing (looking at the stack trace);
- find the resources involved and understand why those threads are causing a possible issue.
- Case study: Tuning WebSphere Application Server V7 and V8 for performance
- Java theory and practice: Thread pools and work queues
- IBM Education Assistance: How to troubleshoot hungs
- How to diagnose starvation issue using IBM Thread and Monitor ...