Bobbing for Kernels

See Bob. See Bob bob. Bob, Bob, bob!

Scheming, part 3

Posted by kernelbob on October 20, 2008

GC is hard.  Let’s go shopping.

I’ve been working on my scheme interpreter a lot.  But I don’t have much to show for it.  What I do have, finally, is a working garbage collector (GC).  GC is a real pain to debug.  Every error shows up as a memory corruption bug.

The GC I implemented is a “Stop and Copy” method.  When the heap is full, the mutator pauses while the GC copies all of the accessible objects in the heap to a new heap.  Conceptually, this is pretty simple.  There is a known set of “root objects”, and each of those is copied into the new heap, and then each object in the new heap is scanned for pointers into the old heap.  Whenever one is found, that object is copied too, and the old copy is replaced by a forwarding pointer.  Here’s some pseudo-code to describe it.

def copy_heap(roots, new_heap):
    """copy everything reachable from the roots into the new heap."""
    for i, root in enumerate(roots):
        set_root(i, new_heap.append(root))
    scan = 0
    while scan < len(new_heap):
        obj = new_heap[scan]
        for i, ptr in enumerate(obj.pointers()):
            if not is_null(ptr) and not in_new_heap(ptr):
                if has_forwarding_ptr(ptr):
                    new_ptr = ptr.forward()
                else:
                    new_ptr = new_heap.append(ptr)
                    ptr.set_forward(new_ptr)
                obj.set_pointer(i, new_ptr)
        scan += 1

That’s the easy part.  It’s slightly more complicated in that you have to keep track of the size of each object, and each object has a set of virtual functions for accessing its size, pointers, number of pointers, and a few other operations.  But that’s just bookkeeping.

The hard part is finding the root pointers.  There are four global roots and one per-thread root, but then every automatic variable in every live function is potentially another root.  I had to analyze every code path to see if each variable could possibly be alive during a memory allocation.  I got it wrong many times.

I did write some nifty C macros to push and pop roots.  Here’s an example of a fictitious function to cons the cars and cdrs of two lists.  In Scheme, it’d look like this.

(define (fictitious a b)
  (cons
    (cons (car a) (car b))
    (cons (cdr a) (cdr b))))

And here’s how it’d be written in C.

obj_t *fictitious(obj_t *a, obj_t *b)
{
    PUSH_ROOT(a);
    PUSH_ROOT(b);
    AUTO_ROOT(cars, make_pair(pair_car(a), pair_car(b)));
    AUTO_ROOT(cdrs, make_pair(pair_cdr(a), pair_cdr(b)));
    RETURN(make_pair(cars, cdrs);
}

Every pointer value must be stored in a root during a call to an allocation function (e.g., make_pair()), because every pointer WILL change value when the allocator is called.

The roots are stored in a linked list on the stack.  Here’s what the first PUSH_ROOT call above expands to.

    root_descriptor_t auto_root_99 = {
        L"a",
        __func__,
        &a,
        NULL
    };
    push_root(&auto_root_99);

The last field, initialized to NULL, is the list link.  push_root will link it to the head of the list.

void push_root(root_descriptor_t *desc)
{
    desc->rd_next = thread_roots;
    thread_roots = desc;
}

So that’s how the roots are kept.  It adds a fair amount of overhead, keeping the list and traversing it.

One more note.  I did some things to make the GC more deterministic and make errors show up sooner.  First, I allocated 100 heaps instead of just two, then used the mprotect() syscall to make all but the current one unreadable.  That helped me find illegal memory refs sooner.  Second, I wrote a heap verification routine that checks the heap for consistency, and I call it several times per operation.  And third, I made the allocator copy the whole heap before every allocation.  That means that any non-root pointer was pointing to unreadable memory after every allocation.

So I have achieved my life goal of writing a GC.  Now I want to make it better.

Advertisements

One Response to “Scheming, part 3”

  1. […] Scheming, part 3 […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: