Learning Languages Followup; Test Languages

I managed to write a new language, and 135 unit tests in that language, in a single day. When I talked about big wins coming from writing in the correct language, this is what I meant.

So, the language is a test language for a function library. We have arithmentic, date, string, and other functions that we need to test. Each function is identified internally by a GUID, and may be configured. So the language looks like;

01 #
02 # Check Array Sum function
03 #
04 declare Sum = {11669A5A-45BA-46c0-A6F6-97CDE4F5CAA5}
05 Sum(null) = null
06 Sum([]) = 0
07 Sum([1.0, 2.0]) = 3.0

In this short script, I add comments, give a function a name (binding the name `Sum` to the function identified with the id `{11669A5A-45BA-46c0-A6F6-97CDE4F5CAA5}`. Then, I define three tests; `Sum(null) = null` means what you would expect; call the sum function, passing in a single null parameter; the result should be null.

Having defined this language (which, I think, took me about an hour) I was then able to write about 135 tests with relative ease. The equivalent C# unit tests would be full of repetition and would not express their meaning anywhere near as fully. You’ve have something like;

[TestMethod]
public void TestSumNullIsNull()
{
var expected = (double)null;
var thefunction = FieldModifierHost.Instance()[“{11669A5A-45BA-46c0-A6F6-97CDE4F5CAA5}”];
var maker = thefunction.MakeMethod();
var instance = maker(new object[]{});
var result = instance(null);
Assert.AreEqual(expected, result);
}

Which is frankly impenetrable.

PS: I’ve just had a colleague add a number of tests, without any instruction, and he’s managed to put confidence tests around a function he wants to change in minutes. Unit test languages FTW!

On Learning to Write Languages

I’ve been learning to write languages recently.

I read Steve Yegge’s thought-provoking post, in which he talks about how, if you know how to deal with language problems like lexing, parsing, translating, and compiling, then you know how to solve a large number of common programming problems.

I’ve been using very simple custom languages at work to write integration tests. Just little bits of work, but they’ve really helped quality by allowing us to write loads and loads of tests quickly and confidently. I think we have about 500 integration tests written which rely on small setup languages.

I think this has become possible because the system we’re writing against is pretty stable. The underlying classes and database tables we’re writing against don’t change too often.

This seems to be the key time for writing your own languages; the underlying libraries have reaches a point of stability, and you are being asked to do complex things to the underlying data.

So if you deal with classes or database tables called ‘Document’, ‘Alert’, and ‘Error’, then you can start making statements using those objects; things like

‘When the document is saved, if the document is not Signed Off, alert the document owner and log an error’

Now, it should be possible to write a translator that turns this into c-sharp; something like;

public void OnSaved(Document document)
{
if (document.State != DocumentState.SignedOff)
{
SendAlert(document.Owner);
LogError(document);
}
}

The first version is significantly easier to understand and write. You can show this to your customer and ask if he agrees with the statement. The language helps communication. The c-sharp is no help at all in communicating.

So, custom languages can help put together systems that are easier to understand, because the language is tuned to the problem, and easier to modify, because the code is invariably shorter than it would be in the general-purpose programming language.

To my mind, if I can learn to write interpreters, compilers, and translators, it allows me to write software in a way that is significantly more easy to maintain.

There are, however, two big problems;

First, learning to write languages is not trivial. It’s a significant investment of time. Your manager is not going to be happy about a proposal that starts “Can I spend the next month learning about languages and not writing production code…” so I think you have to learn about these things in your own time.

Second, once you know how to write interpreters, they are themselves fairly hefty beasts. If it takes you 300 lines and a day of work to write a lexer and parser, you’d better be certain you save more than 300 lines and a day of work in the course in writing scripts in the new language — otherwise what was the point? So you have to pick your battles, picking only those areas that are ripe for better automation.

If you meet these two criteria — you’ve learned languages on your own time and you’re picking an area that’ll benefit from it — I think writing your own languages is a very valuable ability.

So, I’m now reading heavily in the area, writing my own lexers and parser by hand, and starting to look at automated tools like ANTLR and Irony. Irony .Net Language Implementation Kit