Is garbage collection a failed concept?

In this article, I'll discuss garbage collection and the value it brings to large-scale application development.

Quick - call my stockbroker!
There's been a lot of noise about the next version of standard C++. It's due in 2009 and has been named C++0x. (Where's a brand-name expert when you need one.) C++ thought leaders have been let loose and are in full force discussing its merits. Caught in the crossfire of this marketing blitz is a renewed criticism of Java, its performance and its garbage collection. Comparing Java to C++ feels so late 1990's that I suddenly feel the urge to lose money on shares of an Internet start-up. (Pet food? 87 times P/E ratio? This start-up can't miss!)

Garbage collection bad, RAII good
One such example is an interview in which an internationally-recognized C++ expert was asked whether garbage collection should be added to C++0x. His stunning reply was that garbage collection was a "failed and improperly implemented concept". The paraphrased version of his argument is that garbage collection is bad because it is non-deterministic and may not actually succeed. It only deals with one kind of resource - the memory resource. All other resources, such as file handles and database connections for example, remain undressed. Garbage collection robs developers of destructors to manage resources. In summary, garbage collection is a crutch for developers and if we all just follow the resource acquisition is initialization idiom (RAII), we'll do just fine.

The point of my article is not to dissect nor critique the opinion expressed above but rather to focus on general misconceptions about garbage collection.

How real should real-time be?
Garbage collection critics love to point out that GC is bad because it is non-deterministic. It is true that garbage collection removes some degree of application determinism. Since it is automated, collection can occur at any time, run for varying lengths of time and impede on the overlaying application's responsiveness. This can lead to broken functionality within an application if it is required to respond in real-time. Garbage collection also circumvents the need for destructors. Consequently, without destructors, programmers cannot hook resource deallocation and thus cannot precisely predict when a resource is released. Thus garbage collection affects application determinism in two ways: responsiveness and resource management.

I have two points of contention with this. First, let's be honest; how many applications have hard real-time requirements for responsiveness? The majority of applications either have no real-time requirements or have soft ones. The latter is exemplified by online stores, stock trading applications, and GUI-centric apps where fast response times are necessary but the occasional processing delay is tolerated. Second, even with C++, application determinism can only be guaranteed in systems with no shared resources (such as in embedded systems). Most applications run atop a multi-process operating systems where the CPU and I/O resources are shared. The illusion of application determinism quickly evaporates in these types of systems where the application has no control over the sharing of resources. In the end, giving up some degree of determinism in return for automated garbage collection is a trade I'd make any day. There is a big price to be paid in complexity in order to obtain that extra level of determinism promised by C++. More on this point later.

Give RAII a chance
Some believe that RAII is a better way to manage memory. This idiom, perhaps the most important in C++, is a technique of acquiring a resource in the object constructor and releasing it in its destructor. Provided all classes are designed to conform to this pattern, life will be good.

However, there is a big difference between programming by idiom and programming by compiler-enforced rules. An idiom is just a best-practice design pattern. As sound as an idea as RAII is, it doesn't scale very well to handle large-scale development where teams are composed of developers ranging in talent, experience and commitment to code quality. While I have no doubt that an all-star team composed of the likes of Bjarne Stroustrup and Scott Meyers would do just fine with this idiom, the rest of us have to deal with people who aren't always experienced or worst, aren't always concerned about code quality. Idioms require human intelligence to be understood and human care to be enforced. They easily break down in large teams where a minority of people (those experienced/caring people) fix the careless bugs created by the majority. Using automated garbage collection to abstract memory management allows the first category of folks to think about more important things and prevents the second category from making mistakes. The bottom line is that idioms don't scale very well while compiler-enforced rules do. While it is true that garbage collection only addresses the memory resource, it happens the be the most frequently used and the most likely to produce hard-to-find bugs. Garbage collection almost eliminates an entire class of bugs.

Real programmers don't do GC
There are some who still believe that garbage collection is a crutch and that real programmers don't need it. Since there is no ISO-type standard body that can help us discern a real programmer from a fake one, we are out of luck. But let's see if garbage collection can remove our dependency on real programmers for memory management.

Garbage collection abstracts away most of the brain-power dispensed on memory management. This benefits everyone. While this doesn't mean that developers never need to think about memory management, it does mean that most people can think about higher-level problems. At most, perhaps one person in every organization will still need to be sacrificed to the god of heap with the task of tuning the garbage collector by setting the right compaction parameters and nursery sizes.

Manually managing memory adds a level of complexity that requires human time and effort to overcome. In the end, there is no sufficient return on investment to justify its use for non-legacy applications.

Works except when it doesn't
Some people still believe that there are inherent flaws with garbage collection and that it cannot reliably detect garbage.

Garbage collection has been researched extensively in academia since the 1960's. While implementation problems may have plagued early versions Sun's Hotspot JVM, for example, these have been addressed long ago. There are no inherent flaws with automated collection nor with its ability to detect garbage.

Conclusion
It may surprise you to learn that I like C++. I really do. But I like it in the same way I like assembler: because I want to understand what happens under the hood. But when you look at it from a practical point of view, where business cycles drive software development, garbage collection is indispensable in large-scale development. Any talk of avoiding garbage collection at this point of software engineering’s evolution is just plain… garbage.

7 comments:

Jengu said...

If garbage collection were to be added to C++0x, destructors _would not_ be removed (that would be backwards incompatible), so RAII would still be possible. Instead presumably GC collected would simply not execute their destructor -- but stack allocated objects (used for the RAII pattern) would.

Also, that the majority of applications don't need real time response is not a good argument for removing C++'s ability to provide it. Not everyone writes the kinds of applications you do, and there are large established code bases that depend on real time response that will need to be maintained for years and years to come. There are jet fighters running C++ code for example.

Stroustrup himself says in his book The Design and Evolution of C++ that he would like to see garbage collection in some form make its way into C++ eventually. But the committee has a lot of constraints in terms of huge existing code bases to take into account.

Nick Maiorano said...

While embedded systems running jet fighters need all the responsiveness the underlying hardware allows, most other apps can tolerate some delays. By manually managing memory in exchange for extra determinism, these apps end up paying too much in terms of complexity for something they don't really need. My main point was that GC critics are overselling the need for application determinism.

Thanks for your comments.

Bill said...

No, the need for determinism isn't being oversold. You're just not understanding the need. It's not about realtime requirements or responsiveness, but simply about determinism. After all, C# has the using block for just this reason. Other languages that rely on GC don't go that far, but the libraries certainly go out of their way to address the issue. Take ruby for instance. There's no end of helper methods that take a block where the RAII idiom is internally implemented (from memory, and almost guaranteed to not be valid Ruby):

File.Open do |f|
f.Read
end

The only real problem with the deterministic argument is that it's possible to have determinism while keeping the GC. Most of the above alternatives are "by convention only" in a manner much worse than RAII in C++, but it wouldn't have to be that way. C#, for instance, could produce a compiler diagnostic if you used an IDisposable type without calling Dispose() either directly or indirectly (i.e. a using block).

Daniel said...

Excellent analysis! I tend to think that even in systems with hard real-time responsiveness constraints, a correctly implemented generational (or similar) GC is probably sufficient. A *lot* of research has gone into this area, and there's really no reason for modern garbage collection implementations to be "stop the world"-style. Yes, the system still has to slow down a bit during asynchronous memory release, but this is less of a slow-down than one would think, especially on multi-core systems. I still would probably stick with C++ for an air-traffic control system, but the most compelling reason for that decision would be hardware integration, *not* performance.

As an aside, I believe that most jet-fighter systems are built on FORTRAN or possibly ADA. C/C++ never really found its way into military hardware, and thus, still has yet to take off (no pun intended) in the civilian counterpart industry.

adelle said...

The problem with IDisposable, is how do you encapsulate an IDisposable object?

Bill said...

adelle,

What's the issue? The encapsulating class implements IDisposable and delegates to the instance member's Dispose(). There's even a pattern for this. What am I missing?

raw sausage said...

You might also want to take into account that the upcoming JVMs GC differently. They are practically throwing it into a constant low-priority subprocess, which tries to free a little from here and a little from there. The reason is that then you don't have to fall into complete GC cycle often (perhaps ever), which means you can usually allocate and guarantee that the allocation is fast.

That requires however good implementation, but I think it will be possible, especially with the present multi-core processors. It will also reduce the heap thrashing to near zero.

To comment the rest of the article: These C++ people are plain out of their fucking minds. GC... Failed? Yeah right.