On Learning to Write Languages

I’ve been learning to write languages recently.

I read Steve Yegge’s thought-provoking post, in which he talks about how, if you know how to deal with language problems like lexing, parsing, translating, and compiling, then you know how to solve a large number of common programming problems.

I’ve been using very simple custom languages at work to write integration tests. Just little bits of work, but they’ve really helped quality by allowing us to write loads and loads of tests quickly and confidently. I think we have about 500 integration tests written which rely on small setup languages.

I think this has become possible because the system we’re writing against is pretty stable. The underlying classes and database tables we’re writing against don’t change too often.

This seems to be the key time for writing your own languages; the underlying libraries have reaches a point of stability, and you are being asked to do complex things to the underlying data.

So if you deal with classes or database tables called ‘Document’, ‘Alert’, and ‘Error’, then you can start making statements using those objects; things like

‘When the document is saved, if the document is not Signed Off, alert the document owner and log an error’

Now, it should be possible to write a translator that turns this into c-sharp; something like;

public void OnSaved(Document document)
if (document.State != DocumentState.SignedOff)

The first version is significantly easier to understand and write. You can show this to your customer and ask if he agrees with the statement. The language helps communication. The c-sharp is no help at all in communicating.

So, custom languages can help put together systems that are easier to understand, because the language is tuned to the problem, and easier to modify, because the code is invariably shorter than it would be in the general-purpose programming language.

To my mind, if I can learn to write interpreters, compilers, and translators, it allows me to write software in a way that is significantly more easy to maintain.

There are, however, two big problems;

First, learning to write languages is not trivial. It’s a significant investment of time. Your manager is not going to be happy about a proposal that starts “Can I spend the next month learning about languages and not writing production code…” so I think you have to learn about these things in your own time.

Second, once you know how to write interpreters, they are themselves fairly hefty beasts. If it takes you 300 lines and a day of work to write a lexer and parser, you’d better be certain you save more than 300 lines and a day of work in the course in writing scripts in the new language — otherwise what was the point? So you have to pick your battles, picking only those areas that are ripe for better automation.

If you meet these two criteria — you’ve learned languages on your own time and you’re picking an area that’ll benefit from it — I think writing your own languages is a very valuable ability.

So, I’m now reading heavily in the area, writing my own lexers and parser by hand, and starting to look at automated tools like ANTLR and Irony. Irony .Net Language Implementation Kit


4 thoughts on “On Learning to Write Languages

  1. A half-way house, which obviates maintaining an interpreter, is to structure your classes and methods so that they read more like English. Examples I have seen are LINQ and NUnit. I believe it is also quite popular in the Ruby community.

    This approach has the added benefit of you being able to refactor existing code to make it read this way.

    • Yeah, I’ve seen the way this sort of thing is done in Ruby. I think it is an interesting approach.

      The results often look quite good in the language but awful in the punctuation: ‘this_test(has[some].crazy() “internal”.formatting().rules)’

  2. That’s great if you’re going to be around for ages… teaching new recruits your language and having it maintained after you have left the company is a bit of a blocker too.

  3. @Colin: true if you’ve written a complex language. But I wouldn’t recommend that: if you want something general-purpose, embed python 🙂 I’ve found that modest, focussed languages are really quite accessible, though.

Comments are closed.