As described in the previous post, we have four different garbage collectors available in HotSpot JVM. There are some significant differences between them, but the actual concepts behind the algorithms which are used to do the actual job are quite similar. In this short post, I will try to explain three basic algorithms:
Before we move into the details, let’s make sure that we have a common understanding of what GC Roots are. These are the objects which are directly accessible from outside the heap. For example:
- Active threads
- Static variables
- Local variables (accessible via stack of a thread)
- JNI references
All of the algorithms discussed have the same mark phase. Marking phase is about traversing the whole object graph, starting from GC Roots. When GC visits the object, it marks it as accessible and thus alive. All the objects which are not reachable from GC Roots are garbage. Marking requires stop-the-world (STW) pauses, because the running application threads could interfere. How long the STW pause is, depends mostly on the number of visited objects.
After marking phase, we have the memory space which is occupied by visited (accessible via GC Roots) and unvisited objects. Sweep phase releases the memory fragments which contains unreachable objects. It is simple, but because the dead objects are not necessarily next to each other, we end up having a fragmented memory. That’s not bad per se, but trying to fit a too large object into the memory could potentially lead to OutOfMemoryError.
This algorithm fixes the problem with fragmented memory. After all alive objects are marked, they are moved to the beginning of the memory space. That helps to avoid having too fragmented memory, but compacting the heap isn’t for free. Copying objects and updating all references to them take time and it all happens during STW pause.
Mark-copy algorithm copies all alive objects to a new memory region. The previously occupied region is considered to be free. Next time mark-copy is executed, all the alive objects are moved back to the previous memory region. As you can imagine, this of course leads to a memory compaction. Unfortunately, it requires additional extra region large enough to fit all live objects at any given point in time.