Scheming, part 2
Posted by kernelbob on October 9, 2008
I’ve been working on my Scheme interpreter. I just got the nonrecursive evaluator working, so it seems like a good time to take stock.
Why am I writing a Scheme? Is it because there aren’t enough Schemes in the world? No, I’m looking at it as an opportunity to learn the language, get back into C programming, and play with some GC ideas I have.
The trouble is, I haven’t actually written a GC (garbage collector). Instead, my Scheme just calls malloc() and neglects to call free(). Someday, it’ll run out of memory, if I ever run a nontrivial program in it.
Here’s what I have so far.
Reader: understands integers (fixnums), symbols, and lists.
Arithmetic ops: number? integer? = < > <= >= + – * div mod abs. These operate on fixnums only.
Other ops and special forms: lambda quote define set! if. Notable omissions: cons car cdr pair? null?.
A minimal testing framework, which I’ll demonstrate. I’ve been working on eval, so it’s mostly testing evaluation.
~/scheme> ./scheme -t
OK () => ()
OK 123 => 123
OK + => #<proc-C>
OK (+) => 0
OK (+ 3) => 3
OK (+ 3 4) => 7
OK (+ (+ 1 2) (+ 3 4)) => 10
OK ((lambda (x) (+ x 3)) 4) => 7
OK ((lambda (x) (+ x x)) 4) => 8
OK ((lambda (x) (+) (+ x 3)) 4) => 7
OK ((lambda (x) (+ x 3) (+)) 4) => 0
~/scheme>
And the new eval. The first eval was “metacircular”. I’m not sure if that’s a common term of art, but it means eval was defined in terms of eval. I.e., it was a suite of mutually recursive functions. The second eval never quite came to life; I abandoned it before it was working. The third and current eval has goals of using constant-bounded C stack space and being capable of implementing call/cc and continuable exceptions. It’s not quite there yet, because activation frames are not yet real heap objects. But I think it’ll work.
I defined an activation frame (similar to a stack frame, but there is no stack). It has registers storing all the info that needs to be remembered while a subexpression is being evaluated. The activation frame is immutable (except for one register, value); that way call/cc can save one and the computation can be resumed later. So the frames form an inverse tree — each frame points at its parent, but potentially has several children.
The execution model is similar to Forth’s threaded code. Eval is broken into four “blocks”. A block is a C function that is called directly from the main loop and makes progress toward evaluating an expression. A block is passed an activation frame which contains all its arguments, and it returns the next activation frame.
For example, the first block is b_eval(). b_eval gets the subject, the expression to evaluate, and the current environment from its frame. Then it looks at the expression. If it’s a self-evaluating expression, b_eval pops up to the parent frame and sets its value register to the subject. If the subject is a symbol, b_eval looks it up in the environment and passes its value back in the same way. If the subject is an application (applies a function to arguments), then b_eval creates two activation records. The first recursively evaluates the function, then the second begins accumulating the arguments.
Here’s the code.
DEFINE_EXTERN_BLOCK(b_eval)
{
if (is_self_evaluating(F_SUBJ))
RETURN(F_SUBJ);
if (is_symbol(F_SUBJ))
RETURN(eval_symbol(FRAME));
if (is_application(F_SUBJ)) {
obj_t *proc = pair_car(F_SUBJ);
obj_t *args = pair_cdr(F_SUBJ);
CALL_THEN_GOTO((b_eval, proc, F_ENV),
(b_accum_operator, args, F_ENV));
}
RAISE(&syntax);
}
You can see I’ve been abusing the C preprocessor. F_SUBJ is short for ((obj_t *)FRAME->ef_subject), and F_ENV is similar. RETURN, RAISE, and CALL_THEN_GOTO manipulate activation frames. CALL_THEN_GOTO((callee, callee_args…), (target, target_args…)) pushes a frame for the callee which points to a frame to invoke the target.
The main loop just calls blocks. Like this. (F_CONT is (FRAME->ef_continuation).)
while (F_CONT) {
FRAME = (*F_CONT)(FRAME);
}
This whole thing is tricky enough that it makes my brain hurt, but that’s why I do it.
I think the next thing is to put in the minimal set of list processing ops listed above.
Then it’s time to write a real GC.
I’m thinking each object will start with a pointer to its per-class info. The per-class info will include a virtual function table. There’ll be code to subclass existing classes at run time. So maybe, e.g., there’ll be a fixed-length vector class, and activation frame will be a subclass of fixed vector. Lots of things can be fixed-length vectors: cons cells, procedures, symbols…
Scheming, part 7. The Reader. « Bobbing for Kernels said
[...] Scheming, part 2 [...]