Posted by kernelbob on January 9, 2010
The Scheme interpreter project is “on hold”. Maybe it’s “finished”. It’s time to take stock, see what I learned, what I accomplished, and what I could have done better.
The Original Goals
At the beginning of thie project, in September 2008, I wrote some goals.
I don’t know Scheme. I took a couple of classes that used Lisp way back when, but don’t really know Lisp or Scheme. There’s no better way to learn a language than to implement it. (-:
Goal met. I probably know more about the language’s nooks and crannies than the average practitioner.
I want to write something in C. My C is very rusty, since I’ve been using Python exclusively for the last year, and C++ for the last fifteen years.
Goal exceeded. Got to write lots of C. Also started using C at work again a few months later. Fully fluent in C again, though I’m much less tolerant of its limitations than I was, say, 20 years ago.
I want to write a garbage collector. They’ve always fascinated me, but I’ve never written one.
Goal met, just barely. I wrote a simple stop-and-copy GC. Copying collectors have always fascinated me and I wanted to see what limitations it would impose on the rest of the implementation. (See Roots below.)
I have some research ideas about garbage collection that I want to explore.
Not met. I wrote the simplest possible GC, and then went on to other parts of the system. I still have some untested ideas about GC.
In the 16 months since I wrote those goals, they’ve shifted somewhat. To the list, I’d add learning about Unicode, learning about macros, and understanding R6RS’s semantics (and Scheme semantics in general) in great detail.
What went well?
I like the design of the memory subsystem a lot. It is easily extensible, fast enough, and it feels production quality.
The evaluator came together pretty well too. It took a while to get it right, but then it just worked while lots of things around it changed.
I’m very pleased with the reader. I should be — I rewrote it four or five times. The final result is a handcoded LL0 parser generator that builds the parsing tables at startup time. It exactly matches the language spec, because it uses the spec’s grammar as its input.
The scanner is also very good, though it’s incomplete. It only recognizes a limited number syntax. I poked around a little with the idea of writing a regular expression compiler, sort of a handbuilt lex, but didn’t do much, because the existing scanner was fine.
What went badly?
I started with the Revised^6 Report on Scheme, R6RS. That seemed like a good idea for several reasons: R6RS uses Unicode throughout, it has well defined exception semantics, it has a library system, it has fewer loose ends and places for the implementor to improvise (aka create incompatibility) than the other major contender, R5RS.
But R6RS is a very unpopular Scheme. Very few implementations are compliant, so it was hard to find good examples to study. I definitely would have made faster progress implementing R5RS, even with the need to improvise.
I never implemented any kind of exception handler. Any error, no matter how minor, crashes and core-dumps the interpreter. Now there’s a code base with several hundred error checks, with no differentiation between runtime errors, syntax errors, heap corruption, etc. That’s made debugging harder than it should have been. And the interpreter is much less useful than one that recovers from a missing parenthesis.
The way I handled root pointers in automatic C variables was error-prone and slow.
A root pointer is a pointer that is stored outside the heap and points into the heap. On garbage collection, all root pointers must be traced. Any memory allocation can trigger a GC. So any allocation might force a walk of all the roots, including those that are local variables in active C functions.
My solution was to keep a global (per-thread, actually, if threads ever got implemented) linked list of root pointers and define some macros to push automatic variables onto that list and pop them off. Those macros have to be called explicitly every time a root pointer goes into or out of scope.
I must have debugged a hundred occasions when a root pointer was incorrectly pushed or popped. It made me averse to writing code in C because it was so hard to keep the bookkeeping right. And continuously updating the linked list couldn’t have been good for performance.
Originally, I tried to use the R6RS library mechanism as my primitive means for linking the interpreter together. That didn’t work so well. Then I tried linking everything in but implementing the libraries in C. That worked okay but was too rigid. At the end, I started on a redesign where all the primitives are in a global namespace, and a subset namespace is created for user code. Didn’t finish that.
I got bogged down in macro processing. I need to spend more time with a working macro implementation so I understand them better, I think.
This is the big one. My original ideas about how to bootstrap the interpreter were kind of naive. I thought I’d write the memory manager, some primitives, and an evaluator in C, then start writing more procedures in Scheme. Didn’t really know how macro expansion would fit into the process.
After building a lot of low-level code and getting to the point where my interpreter would read and evaluate code, I spent a long time trying to map out a path to implement everything in terms of slightly lower-level units. Made a lot of progress — there are over 100 primitives implemented. But I still didn’t find a path.
I probably invested a couple of hundred hours into reading and re-reading R6RS, R5RS, and Kent Dybvig’s papers on macro expansion looking for clues as to how the pieces fit together. There’s a lot of intro material on the web about building toy interpreters with no macros at all and a lot of research papers about the hard problems of building production-quality Scheme systems, but the middle ground is not well-covered.
So I’m taking a break. Maybe a long break.
I’ve pushed the current sources out to both github and gitorious. Download using either of these commands.
git clone git://github.com/kbob/kbscheme.git git clone git://gitorious.org/kbscheme/mainline.git