Tuesday, March 10, 2015

RUNNABLE thread - It just looks normal

Its been a very interesting day. I just solved a bug that we were chasing after for quite some time. Generally, we collect failed HTTP requests to certain web services into a dedicated Message Queue. We process the messages in parallel by adding them into a fixed Thread Pool.
It worked successfully 99% of the time and happens that several threads have been started and unfortunately didn't accomplish their job.

We couldn't reproduce the issue and it could happen once a week at 5am :)

I couldn't find any exceptions in the server logs and the severs state were just fine. Performance was not affected, RAM & CPU state were normal.
By reading the logs, I knew that the thread hangs during an HTTP request but I knew that whenever there is no response it returns with the appropriate HTTP status code (i.e. 500).

In order to drill down a bit, I wanted to take a look at the JVM of the specific process and for that purpose I used jVisualVM. I couldn't identify anything special - memory state was fine, couldn't identify any deadlocks, parking threads, etc. so decided to generate a Thread Dump:



then passed through the above thread which is in a RUNNABLE state while infinitely waiting for the socket reader to finish its job.
It turn out that even though you set timeout properly (connection and socket timeouts), it can still hang in an infinite state. Since we are using the Google http client API to access several Google services, I couldnt get into the underlying connection manager to cancel the request so I had to wrap the request in a single Future<V> thread with a reasonable timeout that can force an abort to existing request and add the failed message back into the Message Queue.

Finally its all stable again...hmmm lets deal with the next challenge :)



No comments: