Java trash talk

In this post, I'll discuss why garbage collection is Java's trump card when talking about performance.

Them's fighting words
I recently conducted a technical interview for a Java developer. The candidate made a pretty bold statement: he believed that the biggest design error in Java was that garbage collection was automated. At first, I was stunned by this trash talk aimed at Java. What's more is that those words were uttered in a Java-based shop - for a Java-based job - in front of Java-biased techies!

There are some things in life you just take for granted and stop questioning after a while. Things like:

  • Gravity guarantees I will not be swept into outer space as I type this blog
  • A continuous stream of electricity guarantees I can type this sentence
  • Garbage collection destroys my objects once I'm finished with them.

When someone comes along and asserts that a fundamental commodity (gravity, electricity or garbage collection) is a design error, you have to take a step back before you can even begin to formulate a coherent counter-argument.


Misleading comparisons: C++ vs Java
There is still a contingent of people out there who will bring up C++ when discussing performance in Java. I believe that doing so leads to missing the forest for the trees. For the record, it's true that if all we're talking about is brute speed, in all likelihood, a sorting algorithm written in C++ will perform better than the same written in Java. However, automated garbage collection is the trump card that tips the balance in Java's favour.


Garbage collection and memory leaks
We can gloss over the fact that automated garbage collection is a problem that has been solved at the turn of the 21st century. This is one of those instances where after decades of research, a mainstream language has come along to allow designers to stop thinking about garbage collection and move up higher along the stack. At the very least, automated garbage collection solves the big problem of memory leaks. In large-scale applications with always-on type uptime requirements, this alone creates an almost insurmountable advantage in favour of Java's garbage collection. While memory leaks still occur in Java, it's a higher quality problem. It's the difference between "do I have enough money to put food on the table" vs "do I have enough money for marble counter tops in my yacht". Finding memory leaks in a large-scale C++ applications is more like the first kind of problem. Even the best designed applications will have hard to find leaks in C++ that will threaten uptimes. Applications with leaks eventually run out of memory and need be restarted to free this lost memory. I don't know about you but I'd rather be worried about marble counter tops than food.


Performance bottlenecks
More importantly, garbage collection is the crown jewel of Java when it comes to performance. Java's garbage collection enables efficiencies when dealing with heap management. Destruction of objects in C++ is done synchronously. This means that your application needs to wait for an O/S call to de-allocate memory. In Java, this is done asynchronously by a separate thread. This not only removes a potential bottleneck in your application that can occur from having many threads blocking on a centralized resource, (memory management in this case), it also performs faster because the object is not actually destroyed by the calling thread. Given that garbage collection is done by a separate non-application component, the bottleneck barrier is blown away.


Collection parallelism
Garbage collectors can also take advantage of multi-CPU, multi-core processors to parallelize garbage collection. Depending on the GC strategies configured, collection can be performed in parallel by either having a GC thread running on a dedicated core at all times or pausing the application and running one GC thread on each core in parallel.


Most objects die young
Garbage collectors are predicated on the observation that most objects die young. By young, I mean a number measured in milliseconds. With this in mind, a garbage collector can partition the heap into multiple spaces or generations. These are known as the young generation (also known as nurseries or eden) and the old generation. As the names suggest, objects are segregated by age. Given that most objects die young, they can be created in the young generation and die without ever being manipulated by the garbage collector. It is then easier to move only the minority of surviving objects one by one into the other half of the young generation and to simply destroy the entire allocation table for this generation in one fell swoop. This is more efficient that individually removing a majority of objects one entry at a time. With this in mind, programmers need not use error-prone techniques such as object pooling only as means to improve memory allocation efficiency. In fact in Java, it usually better to do nothing rather than implementing object pooling. (Of course, if creating an object is time-consuming for reasons other than memory allocation, then pooling is fine.)


Dark matter matters
Fragmentation is also a problem that can be solved with automated garbage collection. Much like fragmentation on a file system, one of the major impediments to managing memory is the dark matter inside the heap. Dark matter is the space between each object that is too small to be usable. For an always-up type application that requires responsiveness approaching real-time, dark matter can grow over time and kill the application. Individual dark corners of the heap summed together can end up consuming large quantities of memory. Eventually, the runtime can even run out of memory. As such, garbage collectors can de-fragment memory by compacting segments of the heap. Individual objects are pushed together closely so as to free up dark matter. This not only frees up memory, it also makes allocation more efficient. Request for new memory can be served quickly if there is ample contiguous free memory in the heap.


Conclusion
Garbage collection is a huge asset for Java. In fact, it is largely responsible for the success of Java in the enterprise application space. It allows Java to go head to head with the likes of C++ even with the latter's inherent advantage of having native access to the O/S. Looks like there's gold in that garbage.

0 comments: