Ruby compiler tutorial

by topfunky — 2 May 2008

A multi-part tutorial on writing a compiler in Ruby.

Comments

I’ve found this series quite interesting, although that he’s working directly into x86 and using GCC output as some sort of template continues to disenchant me somewhat. LLVM would make a lot more sense (or any virtual machine).

PeterCooper — 2 May 2008

I’m the one writing it, and the choice of x86 assembler was fairly pragmatic.

LLVM for example is a horribly complex beast for what I personally see as very little benefit that would also lock the whole thing into a dependency I don’t want to drag along, and which would also force me to deal with not only the target architecture, but also LLVM’s - no abstraction is perfect.

In any case, one of my goal is to ensure everything in it can be self-hosted.

As the series progress you’ll see more and more of the assembler output split out into a separate small code emitter library. This is a fairly traditional approach to making it easily retargetable, and I think you’ll be surprised how little code would actually need to be rewritten to retarget a relatively full featured compiler.

As for other virtual machines: Not interested. The potential for performance and optimizations by targeting raw hardware still makes it well worth it, and besides I want something that at least in theory could be used as a systems programming language on par with something like C, but with modern features. Targeting a VM throws that out the window.

Another thing I’m considering doing when I get there is to, as an option, implement something calls semantic dictionary encoding, which is - to put it simply - a form of bytecode combined with a JIT (though the bytecode isn’t linear, but represents a compressed form of a tree representation of the program).

Vidar Hokstad — 2 May 2008

If you’re actually making a low-level systems language, your approach makes sense, but that’s an incredibly rare undertaking nowadays. Typically, new languages developed nowadays are higher-level abstractions or syntax experiments. After all, we already have a wide range of high-performance low-level system languages.

It does not bother me at all that you’re developing a systems language, but I’d be concerned if people got the wrong idea and assumed that your technique is a viable way to go for developing most modern, high-level languages. Virtual machines aren’t just a fashion; they have significant benefits and are typically the best way to go, especially if you’re just dabbling.

PeterCooper — 2 May 2008

Excellent points, Peter, and good points to remind us of. Many may not appreciate the distinctions drawn. I’m still enjoying Vidar’s trek, though. I have huge respect for those who undertake such in-depth tutorials, sharing them with others. Thanks, both.

jt — 2 May 2008

Oh, I’m definitely enjoying it, and really appreciate Vidar writing it :)

I’m just aware that so few people write accessible content on this topic that what is written can be highly influential on future language implementers.

One particularly influential set of tutorials along the same lines was Crenshaw’s Let’s Build A Compiler (Pascal not seeming such an odd choice back then ;-)).

PeterCooper — 2 May 2008

Duh, forgot to put my name on the comment above, but I guess it was obvious. Btw. - as a hint of how simple the assembler is: Several comments about the first parts complained that it was “mostly just gcc -S output”. But that’s part of the point: You can build a compiler targeting assembler largely without even knowing much about the target architecture.

Vidar Hokstad — 3 May 2008

You do lose a few things when targeting a real architecture compared to compiling for VMs with built in high level concepts such as GC, but that’s also a blessing. VMs like that also impose a hell of a lot of restrictions on the techniques you can use that mean they are rarely suitable for anything out of the ordinary or innovative (not being able to pick your own GC strategy without rewriting the VM being a big issue).

That’s why developing your own VM is the most common route with modern, high level (in the modern sense) language implementations (c.f. MRI 1.9, Rubinius, Io, Python, Smalltalk, Java). It’s a well proven technique that doesn’t rule out the possibility of native compilation later (which is what Erlang did).

Going directly from a language to architecture-specific assembler is very uncommon nowadays (even Haskell compiles to C– intermediately).

The only modern language I can think of that follows the native compilation route directly is D, which is a systems-level language. You say that’s what you’re also trying to implement, which is fine, and so your route makes a lot of sense, but these are still a different ball game to developing implementations of high level (in the modern sense) languages.

PeterCooper — 4 May 2008

A growing trend in rolex replica is for many women to be rocking Men’s watches. Particularly Men’s luxury rolex replica watches such as Rolex. I believe this is due to the fact that manufacturers are making watches bigger and bigger and that they limit the women’s sizes to a certain size. As a result, many celebrities (including Charlize Theron) have been seen donning oversize men’s watches.

jamess — 22 December 2012

RubyFlow The Ruby and Rails community linklog

The Ruby and Rails community linklog

Ruby compiler tutorial

Comments

Post a comment