A lisp macro virgin tells all

I finished my first lisp macro, and I want to tell the world.

I’ll talk about what a lisp macro is, and what makes it unique in the world of programming, how it’s a technique only possible in lisp. I’ll then take you through an example.

So firstly, what’s a lisp macro, and why would you want to write one?

So, you may have seen lisp programs before, and you’ll recognise them instantly — Larry Wall, the inventor of [Perl][], said they had all the aesthetic appeal of a bowl of porridge mixed with toenail clippings;

(defun accumulate (combiner lst initial)
(let ((accum initial))
(dolist (i lst)
(setf accum (funcall combiner accum i)))
accum))

He has a point. They are butt-ugly. But hell, the best he came up with is [Perl][], so he can `$_@++` right off. (I’m pretty sure that’s valid Perl, too 😉 )

It’s ugly, in an aesthetic way, but it’s amazingly practical. It’s got an engineering beauty to it. If you look at that snippet above, you’ll notice that the whole program is made out of exactly three types of symbols;

* open parenthesis: `(`
* close parenthesis: `)`
* symbols, like `defun`, `accum` and `setf`

All simple lisp programs are like this. Just brackets to group stuff together, and stuff that needs grouping. Compare that with C#, where you might find;

* parenthesis for;
* function calls; `print(“hello”)`
* special forms; `using(OdbcConnection con = …)`
* semi-colon to end statements; `int x = 1;`
* curly brackets for;
* code blocks; `{ /* code block */ }`
* array initialisers; `string[] words = { “hello”, “world” };`
* square brackets for array indexing; `x[3] = 4`;

and the list goes on. I gave up because there are too many to list.

So lisp has this seriously small syntactic footprint. You can have a thing, or a group of things in brackets. It’s simple. It’s *so* simple that you can start doing crazy stuff in lisp that you just can’t do otherwise. That crazy stuff goes by the name of macros.

I can write a program that takes a chunk of lisp (remember, just a thing or a list of things), cuts it up, and reassembles it. That creates new lisp code.

So imagine you do a lot of work on three-dimensional arrays. You find yourself, over and over, writing nested loops that say;

for x in range(100):
for y in range(100):
for z in range(100):
# do something to matrix[x,y,z]

And frankly, you’re bored of typing it over and over. What you really want to do is something like;

for {x 100, y 100, z 100}:
# do something to matrix[x,y,z]

You want a brand new bit of syntax for multiple-value looping. Can you add it to python? Nope. C? Nope. Java? Nope.

But now look at the lisp version;

I could, theoretically, write this

(domanytimes (x 100 y 100 z 100)
body)

and, because it’s just a list of stuff, I can chop and change that into this new bit of lisp;

(dotimes (x 100)
(dotimes (y 100)
(dotimes (z 100)
body)))

I’ll show you how in a second, but notice what’s possible — I can write my own looping construct (`domanytimes`) and lisp will rewrite it into many simpler looping construct (the built-in `dotimes`).

Is that particularly special? Well, yeah. I’ve written new syntax. I’ve defined a new way of looping that is no different from the standard loops. I’ve basically added something new to the language. Lisp is now better at dealing with multi-dimensional loops. Try adding a new loop to ruby, or javascript. Make python understand

for x in range(100), y in range(100), z in range(100):
# body here

and you’ll find you can’t.

So I’ve made my version of lisp a bit better at handling loops. If I were writing database code, I could make lisp better at writing SQL statements or data access layers. C# recently got built-in DAL logic with [LINQ][], and it’s great, but only the C# team can write it. Whereas a lisper could write this sort of code;

(sql-select (ID NAME) from PROJECT where (DUEDATE > TODAY))

and it’s do basically the same thing as [LINQ][].

So that’s the why’s and wherefores. Here’s the how of the `domanytimes` macro.

`domanytimes` takes two parts; the loop variables `(x 100 y 100 z 100)` and whatever body you want to execute. We’re going to write a program that skims two elements from the front of the loop variables (say, `x` and `100`) and uses them to write a built-in `dotimes` loop; so a program which converts

(domanytimes (x 100 y 100) body)

into

(dotimes (x 100)
(domanytimes (y 100) body))

and then again to give you

(dotimes (x 100)
(dotimes (y 100)
body))

Here’s the `domanytimes` macro, in all it’s eye-bleeding horror;

(defmacro domanytimes (loop-list &body body)
“allows you to write (domanytimes (x 10 y 10) …)
instead of (dotimes (x 10) (dotimes (y 10)) body ))”
(if (eq (length loop-list) 0)
;; we have our form to execute
`(progn ,@body)
;; we have more loops to arrange
(let ((fst (car loop-list))
(snd (cadr loop-list))
(rst (cddr loop-list)))
`(dotimes (,fst ,snd)
(domanytimes ,rst ,@body)))))

There. Wasn’t that fun? 😉

It looks nasty, I know. All lisp looks nasty. But it’s actually created something new in the language. As far as I understand it, lisp has survived for fifty years basically because the macro system lets you write macros which can add any new kind of syntax you like. You can write knock up a set of macros to [implement OO][clos], and suddenly lisp is OO. You can know up macros for manipulating lazy lists, and suddenly lisp has a [lazy evaluation][lazy]. You can knock up data access layer macros, and it’s got a version of [LINQ][linq]. There seems to be nothing you can’t hack lisp into being.

And if you want to know how the hell that works, I’d recommend [Practical Common Lisp][pcl], which is online and free.

[linq]: http://msdn2.microsoft.com/en-gb/netframework/aa904594.aspx
[lazy]: http://en.wikipedia.org/wiki/Lazy_evaluation
[clos]: http://en.wikipedia.org/wiki/CLOS
[perl]: http://www.perl.com/
[pcl]: http://gigamonkeys.com/book/

Five languages for talking about programs.

Only one of which is a computer programming language.

At work, we’re designing a new product. It’s a process that involves non-technical users, IT managers, programming colleagues, me, and the computer. It involves a lot of talking, a lot of language. At one end, it’s making sure the users are happy with the system’s capabilities. While talking to them, we use non-technical language, diagrams, and demonstrations. Way at the other, it’s actually typing code.

This makes me think it might be useful to split the languages we’re talking into four;

**User Langage:** Non-technical, example-laden, visual, and concrete. This is where you get into general discussions, produce photoshop mockups of the system, draw rough-and-ready whiteboard pictures, write XP user stories, write user manuals.

**Manager Language:** Your manager needs to know the technical and algorithmic overview, but he doesn’t need to see every class that’s going into the design. This language is the language of high-level functional specs, some UML diagrams, conceptual diagrams.

**Colleage Language:** At this level, you’re going to have to assure yourself that the code you write and the code your colleagues write start meshing together. Precision starts being important, and a real burden. Probably where most of the real thinking goes on. I think one of the reasons Pair Programming works is because it forces you to get into a lot more of these types of discussion, which reduces the possibility that the team’s work won’t gel together.

**Self Language:** This is the idiosyncratic, outboard-brain style of writing that you get into to make sure you understand what you need to be doing. This can be personal pseudocode, notes-to-self, todo lists, or code comments.

**Computer Language:** The code itself.

I think a great deal of the actual design and thinking involved in creating new products goes on above the level of the computer language. When people discuss, say, “Python Vs Java” or “Ruby vs Lisp”, it’s valuable enough, but the choice of language probably isn’t that important in determining the success of the project. I think these language strata go some way to explaining why. Most of the communication goes on between people.

Modifying large codebases in dynamic and static languages

I’ve been wondering recently about dynamic languages, and static languages, and the relative benefits.

I’m struggling with this question because I write C#3 by day, and am learning python in the evenings. I’m only writing small python scripts at the moment and I’d like to write larger pieces, but I’m concerned about how easy it’ll be to make certain types of change.

For example. You’ve got 100,000 lines of code. You also have a logging function that’s looks like this;

void Log(string message)

And it’s called about 200 times in your code. You decide you need a severity; so you change the signature to

void Log(string message, LoggingSeverity severity) { .. }

Now, how long does it take to find all the calls to the Log() function that need to be updated? Under C#, about ten seconds. Once every call has been fixed, the code is almost certain to work correctly.

Consider, on the other hand, the python function

def log(message):

What happens if you change the signature to

def log(message, severity):

There is no way to tell where the log message is called. You’ve just introduced 200 bugs.

It’s made even worse by duck typing; maybe you have two loggers — a deployment logger which writes to a database, and a test logger which writes to stdout. You update the database logger so it has severity. Your tests continue to pass, but your deployed system will fail.

So it seems to me that static languages give you much more power to make changes to large codebases. I’d love to know if, and where, the mistakes are in my thinking.