Bobbing for Kernels

See Bob. See Bob bob. Bob, Bob, bob!

Scheme status

Posted by kernelbob on October 15, 2009

I’ve written here before about my efforts to write a Scheme interpreter from scratch.  The last update was in May, but I’m still working on it.  I’ve made 60 git commits since then.  The major thing I’m trying to do is implement macros, but I’m having a hard time of it.  No matter.  When they’re done, I’ll understand macros thoroughly.

One area that’s changed is that the interpreter is now featureful enough that I’m writing about 90% of the new code in Scheme instead of C.  So I’m getting some practice writing in Scheme.  It has about 1,500 lines of Scheme now.  Lots of standard procedures, partial reimplementation of the library system, and the aforementioned macro expander, which has a lot of code even if it isn’t complete.

Macros.  R6RS specifies both syntax-rules and syntax-case, and it implies that you can build your own macros out of lower-level routines.  Syntax-case is the most general, so that’s what I’m implementing.  I have repeatedly read three papers by syntax-case’s inventor, Kent Dybvig, and have just about worn out the paper printouts I carry everywhere with me.  I almost understand it… (-:

First, a user’s guide to syntax-case.  Lots of examples ranging from very simple to fairly complex.

http://www.cs.indiana.edu/~dyb/pubs/tr356.pdf

Second, the paper that presents the algorithm.

http://www.cs.indiana.edu/~dyb/pubs/LaSC-5-4-pp295-326.pdf

Third, a chapter from the book, Beautiful Code, where Dybvig presents a toy implementation of his algorithm.  It’s a toy because it implements syntax-case in terms of syntax-case, which is cheating.

http://www.cs.indiana.edu/~dyb/pubs/bc-syntax-case.pdf

So I’m getting closer.  As of this week, I can do α-substitution (see below) on lambda expressions.

Anyway, enough about macros.  Let’s talk about memory.

Very early on, I came up with a pretty good object memory implementation, I thought.  It’s object oriented.  Each object has a header word, and the header word points to what’s basically a virtual function table.  The VF table makes it possible to identify each object’s type, and has virtual method pointers for the operations the GC needs such as getting the object’s size or finding each of its pointers.  There are some abstract classes, fixvec and mixvec, to make writing various fixed-size objects easy.  fixvec is a parameterized type that implements a fixed-size vector of object pointers.  So a pair (aka a cons cell) is a fixvec(2), and the pair implementation doesn’t do much more than map pair_car() to fixvec2_get(0).  Mixvec is similar.  It’s a fixed-size object with M non-pointer words and N pointer words.  A binding, for example, has two pointers (name and value) and one non-pointer word (type flags).

So it’s good.  At this point, there are about 13 object classes, and I haven’t had to redo the core.  But then I realized that I have the world’s only Scheme that doesn’t store simple types  like fixnums, characters, and booleans as immediate values.  I.e., I store a fixnum in an object.  It has one word of header and one word of integer.  The vast majority of Schemes use tagged pointers and store a fixnum directly in the pointer.  Chicken Scheme has a typical implementation.

I was aware of that technique all the time, but didn’t do it that way, for some reason.  Now I’m thinking it would be a good optimization.  After syntax-case works.

Also, let’s talk about anonymous symbols.  A Scheme symbol has the property that it’s unique.  There is only ever one symbol named foo, for example.  Every foo everywhere in the system refers to the same symbol object.  My Scheme stores all its symbols on a long list, and when it tries to create a symbol named foo, it searches the list for an existing symbol by that name.  If it finds one, it returns that.  Otherwise, it creates a new symbol and inserts it at the head of the list.  (N.B., this means that symbols are never garbage collected.  That’s more or less okay, as they have long lifetimes.)

But the macro expander generates lots of symbols.  It uses a technique called α-substitution, where it renames all the local variables as it expands, generating new names for each.  I’d seen, somewhere in my Lisp/Scheme reading, the idea of symbols that don’t have names until they’re printed, and I implemented that.  It worked out really well.  The string object module has a new routine to create a symbol.

obj_t *make_anonymous_symbol(void)
{
    return alloc_symbol(NIL);
}

alloc_symbol() is the usual symbol constructor, and the argument (NIL) is a Scheme string for the symbol’s name.  alloc_symbol() does not link the symbol onto the big list.  So that was easy.  Then we just need a way to give the symbol a name when it’s printed.

The symbol name accessor used to look like this.

obj_t *symbol_name(obj_t *symbol)
{
    return fixvec1_get_ptr(symbol, 0);
}

Now it looks like this.

obj_t *symbol_name(obj_t *symbol)
{
    obj_t *name = fixvec1_get_ptr(symbol, 0);
    if (is_null(name)) {
        wchar_t name_buf[12];
        while (true) {
            ssize_t name_len = swprintf(name_buf,
                                        sizeof name_buf,
                                        L"g%04d",
                                        ++gen_name_counter);
            name = make_string_from_chars(name_buf, name_len);
            if (find_symbol(name))
                continue;
            fixvec1_set_ptr(symbol, 0, name);
            all_symbols_list = make_pair(symbol,
                                         all_symbols_list);
            break;
        }
    }
    return name;
}

That’s a little more complicated, but the idea is that when we need the symbol’s name, we start generating names starting with g0001, g0002, etc., until we find one that isn’t in use.  then we add the symbol to the big list of symbols so it remains unique with its new name.

The “g” in the created name is for “generated”.

This is cool for several reasons.  First, most anonymous symbols never have names.  Second, they can be garbage collected.  Third, they don’t make the big symbol list any longer/slower to search.  Fourth, the path through symbol_name() for regular (named) symbols only slows down by a single test for NIL.

You may have noticed that symbol names are Unicode (wchar_t).  That’s another thing I’ve done since May.  The whole Scheme system is Unicode.  When I started the interpreter, I used libunicode, which was already installed.  That was a bad choice.  When I upgraded to Ubuntu 9.04, libunicode disappeared.  It had been deprecated for the last five years, but I didn’t notice.  In June, after looking long hard at the available Unicode libraries, I wrote my own Unicode support from scratch.  It parses Unicode.txt (part of the spec distributed by the Unicode Consortium) and builds some character class tables.

So that’s a quick overview of the last five months.  Sorry for the lengths – length of time between updates and length of this post.

Advertisements

3 Responses to “Scheme status”

  1. nobody said

    implementing syntax-case in syntax-case might be cheating, but it’s not a toy. the portable syntax-case expander is written with itself and is the expander in many scheme implementations.

    see: https://launchpad.net/r6rs-libraries/

    note that wchar_t is not very portable because its size is not specified by the standard.

  2. kernelbob said

    Okay, toy is too strong a word. Still, explaining expansion in terms of syntax-case hides too much magic. One of my goals is to create a self-bootstrapping system, so I’m using a tiny subset of Scheme.

    And you’re right about wchar_t. It’s on my (very long) to-do list to use a different type. But for now, I get a lot of leverage from libc’s character handling, such as the call to printf above.

    Thanks for your comments.

  3. Nobody said

    I agree, self-hosting systems are not great for teaching.

    Anyway you seem to have done your homework but I’ll mention these just in case you haven’t seen them.

    Alexpander expands R5RS macros, and doesn’t rely on macros at all.

    http://petrofsky.org/src/alexpander.scm

    Andre van Tonder’s R6RS expander needs macros to boot itself, but it only has some simple syntax-rules macros, so it can be compiled with nearly any Scheme system. In fact, maybe it could be combined with Alexpander, so you could compile it on a Scheme without macros. In any case I found it to be much simpler than psyntax.

    http://www.het.brown.edu/people/andre/macros/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: