Garbage Collection: The Invisible Janitor Inside Your Code
Mark-and-sweep, reference counting, generational collection — how your runtime silently frees memory you forgot about.
Neural Download
Installing mental model for garbage collection.
Every time your program creates an object — a string, a list, a user record — it claims a chunk of memory. But memory is finite. If you keep creating objects without releasing them, you'll run out.
In C, you manage this yourself. You allocate, you free. Forget to free? Memory leak. Free too early? Use-after-free bug. Free twice? Crash. Manual memory management is a minefield.
So most modern languages made a deal. You create objects whenever you want. You never free them. And the language figures out when objects are no longer needed, and reclaims their memory automatically.
That's garbage collection. The invisible janitor running inside Python, Java, JavaScript, Go, and almost every language you use daily. But how does the garbage collector know which objects are garbage? That turns out to be a surprisingly deep question.
The simplest approach is reference counting. Every object gets a counter. When something points to this object, the counter goes up. When a reference is removed, the counter goes down. When the counter hits zero, the object is garbage. Delete it immediately.
Python uses reference counting as its primary strategy. Create a list, the count is one. Assign it to another variable, the count goes to two. Delete one variable, back to one. Delete the other, zero — the list is freed instantly.
It's elegant. It's simple. And it has a fatal flaw.
Imagine object A points to object B, and object B points back to A. Both have a reference count of one. Now delete every external reference to them. Their counts never hit zero — because they reference each other. They're unreachable garbage, but reference counting can't see it.
These reference cycles are real. They happen constantly in data structures like doubly linked lists, parent-child trees, and callback registrations. Python handles this with a secondary cycle detector. But reference counting alone isn't enough.
Mark and sweep takes a completely different approach. Instead of counting references, it asks a simple question: can I reach this object from the program's roots?
Roots are the starting points — local variables on the stack, global variables, CPU registers. Anything directly accessible to running code.
The mark phase starts at these roots and follows every reference, like exploring a graph. Each reachable object gets marked as "alive." Object A is on the stack — marked. A points to B — marked. B points to C — marked.
When the traversal is complete, anything not marked is unreachable. The program can never touch it again. It's garbage.
The sweep phase walks through all of memory. Every unmarked object gets freed. Its memory is reclaimed and made available for new objects.
This solves the cycle problem completely. If A and B point to each other but nothing reachable points to either of them, neither gets marked. Both get swept. No counter tricks needed.
The tradeoff? The collector has to pause the program to do its work. The whole world stops while it marks and sweeps. For a program with millions of objects, this pause can take tens of milliseconds. That's an eternity in a server handling thousands of requests per second.
Here's an observation that changed everything. Most objects die young. A temporary string created inside a function? Gone by the next line. A loop variable? Dead after the loop ends. Studies show that over 90% of objects become garbage within milliseconds of being created.
But some objects live forever. Your database connection pool, your configuration, your cached data — created at startup and used for the lifetime of the program.
Generational garbage collection exploits this pattern. It divides memory into generations. New objects go into the young generation — a small region that gets collected frequently. Objects that survive multiple collections get promoted to the old generation — a larger region collected rarely.
Young generation collections are fast because most objects are already dead. You only need to mark the few survivors and promote them. The rest — the vast majority — get swept in one pass.
Java's G1 collector, Goes concurrent collector, and JavaScript's V8 engine all use generational strategies. It's why modern garbage-collected languages can handle millions of objects without noticeable pauses. The garbage collector is always running, but it's usually only cleaning the youngest, smallest region of memory.
Every garbage collector makes a tradeoff between three things: throughput, latency, and memory overhead.
Throughput is how much CPU time goes to your actual program versus garbage collection. Latency is how long the world stops when the collector runs. Memory overhead is how much extra memory the collector needs to do its job.
You can't optimize all three. High throughput collectors batch work and run less often — but when they do run, the pause is longer. Low latency collectors run concurrently with your program — but they use more CPU and memory to coordinate.
Go chose latency. Its collector runs concurrently, keeping pauses under a millisecond. Java chose throughput — the G1 collector batches work for maximum efficiency, but pauses can hit tens of milliseconds. Python's reference counting gives instant cleanup but burns CPU incrementing and decrementing counters on every assignment.
Rust took a different path entirely. No garbage collector at all. The borrow checker proves at compile time when memory can be freed. Zero runtime cost. But you pay for it with the learning curve — fighting the borrow checker is a rite of passage.
The next time your program silently handles memory without you thinking about it — that's the garbage collector. The most underappreciated algorithm in computing, running billions of times per second across every device on Earth.
Cognitive architecture... updated.
