Bobbing for Kernels

See Bob. See Bob bob. Bob, Bob, bob!

Scheming, part 2

Posted by kernelbob on October 9, 2008

I’ve been working on my Scheme interpreter.  I just got the nonrecursive evaluator working, so it seems like a good time to take stock.

Why am I writing a Scheme?  Is it because there aren’t enough Schemes in the world?  No, I’m looking at it as an opportunity to learn the language, get back into C programming, and play with some GC ideas I have.

The trouble is, I haven’t actually written a GC (garbage collector).  Instead, my Scheme just calls malloc() and neglects to call free().  Someday, it’ll run out of memory, if I ever run a nontrivial program in it.

Here’s what I have so far.

Reader: understands integers (fixnums), symbols, and lists.

Arithmetic ops: number? integer? = < > <= >= + – * div mod abs.  These operate on fixnums only.

Other ops and special forms: lambda quote define set! if.  Notable omissions: cons car cdr pair? null?.

A minimal testing framework, which I’ll demonstrate.  I’ve been working on eval, so it’s mostly testing evaluation.

~/scheme> ./scheme -t
OK () => ()
OK 123 => 123
OK + => #<proc-C>
OK (+) => 0
OK (+ 3) => 3
OK (+ 3 4) => 7
OK (+ (+ 1 2) (+ 3 4)) => 10
OK ((lambda (x) (+ x 3)) 4) => 7
OK ((lambda (x) (+ x x)) 4) => 8
OK ((lambda (x) (+) (+ x 3)) 4) => 7
OK ((lambda (x) (+ x 3) (+)) 4) => 0

And the new eval.  The first eval was “metacircular”.  I’m not sure if that’s a common term of art, but it means eval was defined in terms of eval.  I.e., it was a suite of mutually recursive functions.  The second eval never quite came to life; I abandoned it before it was working.  The third and current eval has goals of using constant-bounded C stack space and being capable of implementing call/cc and continuable exceptions.  It’s not quite there yet, because activation frames are not yet real heap objects. But I think it’ll work.

I defined an activation frame (similar to a stack frame, but there is no stack).  It has registers storing all the info that needs to be remembered while a subexpression is being evaluated.  The activation frame is immutable (except for one register, value); that way call/cc can save one and the computation can be resumed later.  So the frames form an inverse tree — each frame points at its parent, but potentially has several children.

The execution model is similar to Forth’s threaded code.  Eval is broken into four “blocks”.  A block is a C function that is called directly from the main loop and makes progress toward evaluating an expression.  A block is passed an activation frame which contains all its arguments, and it returns the next activation frame.

For example, the first block is b_eval().  b_eval gets the subject, the expression to evaluate, and the current environment from its frame.  Then it looks at the expression.  If it’s a self-evaluating expression, b_eval pops up to the parent frame and sets its value register to the subject.  If the subject is a symbol, b_eval looks it up in the environment and passes its value back in the same way.  If the subject is an application (applies a function to arguments), then b_eval creates two activation records.  The first recursively evaluates the function, then the second begins accumulating the arguments.

Here’s the code.

    if (is_self_evaluating(F_SUBJ))
    if (is_symbol(F_SUBJ))
    if (is_application(F_SUBJ)) {
        obj_t *proc = pair_car(F_SUBJ);
        obj_t *args = pair_cdr(F_SUBJ);
        CALL_THEN_GOTO((b_eval, proc, F_ENV),
                       (b_accum_operator, args, F_ENV));

You can see I’ve been abusing the C preprocessor. F_SUBJ is short for ((obj_t *)FRAME->ef_subject), and F_ENV is similar. RETURN, RAISE, and CALL_THEN_GOTO manipulate activation frames. CALL_THEN_GOTO((callee, callee_args…), (target, target_args…)) pushes a frame for the callee which points to a frame to invoke the target.

The main loop just calls blocks.  Like this.  (F_CONT is (FRAME->ef_continuation).)

    while (F_CONT) {
        FRAME = (*F_CONT)(FRAME);

This whole thing is tricky enough that it makes my brain hurt, but that’s why I do it.

I think the next thing is to put in the minimal set of list processing ops listed above.

Then it’s time to write a real GC.

I’m thinking each object will start with a pointer to its per-class info.  The per-class info will include a virtual function table.  There’ll be code to subclass existing classes at run time.  So maybe, e.g.,  there’ll be a fixed-length vector class, and activation frame will be a subclass of fixed vector.  Lots of things can be fixed-length vectors: cons cells, procedures, symbols…


One Response to “Scheming, part 2”

  1. […] Scheming, part 2 […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: