Thread profiling in Java 5: Synchronization issues (2)

Reading the thread profile data: synchronization issues (2)

On the previous page, we discussed some locking options to allow our profile thread and another reader thread to access the data correctly. We decided on an implementation using AtomicInteger, although we mentioned that synchronized or a ConcurrentHashMap could provide (slightly heavier-handed) solutions.

So back to our AtomicInteger which will represent our lock. When it has a value of 0, that means that no thread is using the profile data. When it has a value of 1, that means that the data is in use and we need to wait for it to return to zero. So our profiler run() method looks something like this:

private AtomicInteger lock = new AtomicInteger(0);
...
    while (!terminateRequest) {
      long[] threadIds = threadBean.getAllThreadIds();
      ThreadInfo[] infs = threadBean.getThreadInfo(threadIds, maxDepth);
      
      while (!lock.compareAndSet(0, 1)) {
        Thread.sleep(1);
      }
      try {
        // ... process infs ...
      } finally {
        lock.set(0);
      }

      Thread.sleep(profileInterval);
    }

In the vast majority of cases, acquiring the lock will more or less boil down to a simple CAS operation. If there really is contention (because another thread is reading the profile data), we know that it will be done with it fairly quickly, but probably not so quickly that we want to re-try the compareAndSet() immediately. So we sleep for the smallest amount of time we can¹ before trying again. This isn't a fantastic situation– "a millisecond is a long time in computing"– but without knowing that the other thread will finish in the time it takes to do a small number of further CAS operations (and thinking that it won't), it's the most appropriate thing we can do². Other options would be:

use Thread.yield(), with the proviso that its precise operation varies from system to system and even between Hotspot versions (see the aforementioned link for more details);
allow the reading thread to be notified when the profiling data is ready: for our purposes— in principle, an elegant solution, but probably overengineering for a situation that will rarely occur.

In terms of locking, the method that reads the data (called in a different thread) will perform a similar operation, sleeping for 1 millisecond if it can't acquire the lock immediately. So now let's consider how we'll read the data and present it back to the caller.

And now for... actually reading the data...!

We need to make sure that the caller getting the profile data sees a consistent "view". Although we've acquired a lock on the data map overall, if we simply pass back mutable objects (such as instances of our ProfileInfo class) from that map to the caller, then the caller risks seeing a changing or corrupted version of a particular piece of data (count). Apart from just being plain wrong, this will create unpredictable effects when, say, trying to order the profile data by count.

A better solution is to return a copy of the data, and in so doing we can actually make the task of ordering the data easier. So we create a comparable class called LineAndCount:

  public static class LineAndCount implements Comparable<LineAndCount> {
    private final String className, methodName;
    private final int lineNo, tickCount;
    private LineAndCount(StackTraceElement lineInfo, ProfileInfo count) {
      this.className = lineInfo.getClassName();
      this.methodName = lineInfo.getMethodName();
      this.lineNo = lineInfo.getLineNumber();
      this.tickCount = count.tickCount;
    }
    public String getClassName() { return className; }
    // ... plus other accessor methods ...

    public int compareTo(LineAndCount o) {
      return (tickCount < o.tickCount) ? -1 :
               ((tickCount > o.tickCount) ? 1 : 0);
    }
    public String toString() {
      return String.format("%5d %s.%s:%d", tickCount, className,
           methodName, lineNo);
    }
  }

Now our profiler can provide a method that returns a list of LineAndCount objects. We put this method inside our profiler class, ready to be called from another thread that wants to read the profiling data:

  public List getProfile() {
    while (!lock.compareAndSet(0, 1)) {
      try {
        Thread.sleep(1);
      } catch (InterruptedException iex) {
        Thread.currentThread().interrupt();
        throw new RuntimeException("Unexpected interruption waiting for lock");
      }
    }
    List res;
    try {
      res = new ArrayList(lineCounts.size());
      for (Map.Entry e : lineCounts.entrySet()) {
        StackTraceElement line = e.getKey();
        ProfileInfo profCnt = e.getValue();
        res.add(new LineAndCount(line, profCnt));
      }
    } finally {
      lock.set(0);
    }
    Collections.sort(res);
    return res;
  }

And now the caller, to display a list of the 40 most time-consuming points of execution, can call:

List<ThreadProfiler.LineAndCount> counts = prof.getProfile();
int sz = counts.size();
int noCounts = Math.min(40, sz);
for (int i = 1; i <= noCounts; i++) {
  LineAndCount lc = counts.get(sz-i);
  System.out.println(toString());
}

And there we have a very basic profiler. Some enhancements to think about:

when deciding on the "interesting" stack trace element, we could skip past classes in the API library (i.e. not ones in our application's outer package);
we could take into account CPU time, and list a profile showing methods/lines by actual CPU time consumed as opposed to "raw" time.
we wouldn't want the map of profiling information to grow unbounded (yes, I have crashed my own server by letting it do this). Possible solutions are to dump and clear the data every now and then, or check for the map getting to a certain size and then clearing out lines with counts below a certain threshold.

1. As of Java 5, it is possibly to request a sleep of less than a second, but as far as I'm aware no current mainstream OS can grant less than a second. On the other hand, Windows and Linux generally can grant a sleep of one second. For more details, see the section on Thread.sleep().
2. Interestingly, and depending on the platform, a CAS operation may be quite a bit slower than a "normal" data read. So just possibly we may actually be better off repeating the CAS a few times before sleeping. I've not actually profiled this and am essentially guestimating that sleeping for a second is the appropriate action, given that iterating through a large-ish set of profiling data isn't completely trivial. This is a case where a sophisticated synchronized implementation that did such profiling on the fly could possibly work in our favour.

If you enjoy this Java programming article, please share with friends and colleagues. Follow the author on Twitter for the latest news and rants. Follow @BitterCoffey