Bobbing for Kernels

See Bob. See Bob bob. Bob, Bob, bob!

Scheme: Retrospective

Posted by kernelbob on January 9, 2010

The Scheme interpreter project is “on hold”. Maybe it’s “finished”. It’s time to take stock, see what I learned, what I accomplished, and what I could have done better.

The Original Goals

At the beginning of thie project, in September 2008, I wrote some goals.

I don’t know Scheme. I took a couple of classes that used Lisp way back when, but don’t really know Lisp or Scheme. There’s no better way to learn a language than to implement it. (-:

Goal met. I probably know more about the language’s nooks and crannies than the average practitioner.

I want to write something in C. My C is very rusty, since I’ve been using Python exclusively for the last year, and C++ for the last fifteen years.

Goal exceeded. Got to write lots of C. Also started using C at work again a few months later. Fully fluent in C again, though I’m much less tolerant of its limitations than I was, say, 20 years ago.

I want to write a garbage collector. They’ve always fascinated me, but I’ve never written one.

Goal met, just barely. I wrote a simple stop-and-copy GC. Copying collectors have always fascinated me and I wanted to see what limitations it would impose on the rest of the implementation. (See Roots below.)

I have some research ideas about garbage collection that I want to explore.

Not met. I wrote the simplest possible GC, and then went on to other parts of the system. I still have some untested ideas about GC.

In the 16 months since I wrote those goals, they’ve shifted somewhat. To the list, I’d add learning about Unicode, learning about macros, and understanding R6RS’s semantics (and Scheme semantics in general) in great detail.

What went well?

Memory Manager

I like the design of the memory subsystem a lot. It is easily extensible, fast enough, and it feels production quality.

Evaluator

The evaluator came together pretty well too. It took a while to get it right, but then it just worked while lots of things around it changed.

Reader

I’m very pleased with the reader. I should be — I rewrote it four or five times. The final result is a handcoded LL0 parser generator that builds the parsing tables at startup time. It exactly matches the language spec, because it uses the spec’s grammar as its input.

Scanner

The scanner is also very good, though it’s incomplete. It only recognizes a limited number syntax. I poked around a little with the idea of writing a regular expression compiler, sort of a handbuilt lex, but didn’t do much, because the existing scanner was fine.

What went badly?

R6RS

I started with the Revised^6 Report on Scheme, R6RS. That seemed like a good idea for several reasons: R6RS uses Unicode throughout, it has well defined exception semantics, it has a library system, it has fewer loose ends and places for the implementor to improvise (aka create incompatibility) than the other major contender, R5RS.

But R6RS is a very unpopular Scheme. Very few implementations are compliant, so it was hard to find good examples to study. I definitely would have made faster progress implementing R5RS, even with the need to improvise.

Exceptions

I never implemented any kind of exception handler. Any error, no matter how minor, crashes and core-dumps the interpreter. Now there’s a code base with several hundred error checks, with no differentiation between runtime errors, syntax errors, heap corruption, etc. That’s made debugging harder than it should have been. And the interpreter is much less useful than one that recovers from a missing parenthesis.

Roots

The way I handled root pointers in automatic C variables was error-prone and slow.

A root pointer is a pointer that is stored outside the heap and points into the heap. On garbage collection, all root pointers must be traced. Any memory allocation can trigger a GC. So any allocation might force a walk of all the roots, including those that are local variables in active C functions.

My solution was to keep a global (per-thread, actually, if threads ever got implemented) linked list of root pointers and define some macros to push automatic variables onto that list and pop them off. Those macros have to be called explicitly every time a root pointer goes into or out of scope.

I must have debugged a hundred occasions when a root pointer was incorrectly pushed or popped. It made me averse to writing code in C because it was so hard to keep the bookkeeping right. And continuously updating the linked list couldn’t have been good for performance.

Libraries

Originally, I tried to use the R6RS library mechanism as my primitive means for linking the interpreter together. That didn’t work so well. Then I tried linking everything in but implementing the libraries in C. That worked okay but was too rigid. At the end, I started on a redesign where all the primitives are in a global namespace, and a subset namespace is created for user code. Didn’t finish that.

Macros

I got bogged down in macro processing. I need to spend more time with a working macro implementation so I understand them better, I think.

Bootstrapping

This is the big one. My original ideas about how to bootstrap the interpreter were kind of naive. I thought I’d write the memory manager, some primitives, and an evaluator in C, then start writing more procedures in Scheme. Didn’t really know how macro expansion would fit into the process.

After building a lot of low-level code and getting to the point where my interpreter would read and evaluate code, I spent a long time trying to map out a path to implement everything in terms of slightly lower-level units. Made a lot of progress — there are over 100 primitives implemented. But I still didn’t find a path.

I probably invested a couple of hundred hours into reading and re-reading R6RS, R5RS, and Kent Dybvig’s papers on macro expansion looking for clues as to how the pieces fit together. There’s a lot of intro material on the web about building toy interpreters with no macros at all and a lot of research papers about the hard problems of building production-quality Scheme systems, but the middle ground is not well-covered.

So I’m taking a break. Maybe a long break.

I’ve pushed the current sources out to both github and gitorious. Download using either of these commands.

git clone git://github.com/kbob/kbscheme.git
git clone git://gitorious.org/kbscheme/mainline.git
About these ads

2 Responses to “Scheme: Retrospective”

  1. nobody said

    The best way I’ve found to deal with roots is to have your own fake stack frames containing an array of the roots in the current function. These frames are a singly-linked list, so tracing is similar but probably faster, and you don’t need to explicitly pop every root. You could also use them to propagate exceptions (better than setjmp/longjmp at least)

  2. kernelbob said

    Thanks, Nobody. I remain optimistic about the setjmp/longjmp mechanism. For one thing, it has zero overhead until the GC strikes.

    The C eval is more like a VM than a metacircular eval — it manages its continuations explicitly instead of recursing. So I’ll be using longjmp like a machine-level exception, and it will explicitly push a Scheme exception onto the continuation stack.

    Maybe I’m wrong/crazy. We’ll see.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: