Python-style string formatting for C#

[Jon Skeet][skeet] recently asked in one of [his posts][op]:

> it would be really nice to be able to write:
>
> `throw new IOException(“Expected to read {0} bytes but only {1} were available”, requiredSize, bytesRead);`

Which would do the same as

throw new IOException(String.Format(
“Expected to read {0} bytes but only {1} were available”,
requiredSize, bytesRead));

And it got me wondering about the String.Format method, and how much uglier it makes C# code to read than, say, the equivalent python code. Alongside each other;

// C#
string message = String.Format(
“Expected to read {0} bytes but only {1} were available”,
requiredSize, bytesRead);

// python
message = “Expected to read %s bytes but only %s were available” % (requiredSize, bytesRead)

I think I’d solve the problem, not by creating a new constructor for `IOException`, but by making String.Format part of the C# syntax. It works very nicely for python, and it’s such a common thing to do that I tink it would warrant a change to the language. Given how cumbersome String.Format is, it’s often shorter and clearer to use simple string concatenation. This makes things rather inconsistent.

Here’s what I came up with. It’s a ‘first draft’, and more for interest’s sake than as something I’d put into production.

Instead of passing an object array in as the values, I’m reading from the properties of an object. So you can do it with objects or tuples;

var person = new Person()
{
firstname=”Steve”,
secondname=”Cooper”
};

Then you can inject the tuple into a format string like this;

string message = “{firstname} {surname} says injecting properties is fun!”.ㄍ(person)
// message == “Steve Cooper says injecting properties is fun!”

So you’ll see this weird thing on the end of the format string that looks like a double-chevron. This is supposed to look like a double arrow, pushing values into the format string. In fact, it’s the [Bopomopho letter ‘G’][g] and therefore a perfectly normal C# method name.

Here’s the code for the double-chevron method. I say again, this is _just a proof of concept_, not production code. Use at your own peril. (In fact, don’t use. Write your own. It’ll be more solid.)

public static class StringFormatting
{
public static string ㄍ(this string format, object o)
{
var rx = new System.Text.RegularExpressions.Regex(@”{(?w+)}”);
var match = rx.Match(format);
while (match.Success)
{
string name = match.Groups[“name”].Value;
format = format
.Replace(“{“, “{{“)
.Replace(“}”, “}}”)
;
format = format.Replace(“{{” + name + “}}”, “{0}”);

object prop = o.GetType().GetProperty(name).GetValue(o, null);
format = string.Format(format, prop);
match = rx.Match(format);

}
return format;
}
}

[skeet]: http://msmvps.com/blogs/jon_skeet/default.aspx
[op]: http://msmvps.com/blogs/jon_skeet/archive/2009/01/23/quick-rant-why-isn-t-there-an-exception-string-params-object-constructor.aspx
[g]: http://www.alanwood.net/unicode/bopomofo.html

Modifying large codebases in dynamic and static languages

I’ve been wondering recently about dynamic languages, and static languages, and the relative benefits.

I’m struggling with this question because I write C#3 by day, and am learning python in the evenings. I’m only writing small python scripts at the moment and I’d like to write larger pieces, but I’m concerned about how easy it’ll be to make certain types of change.

For example. You’ve got 100,000 lines of code. You also have a logging function that’s looks like this;

void Log(string message)

And it’s called about 200 times in your code. You decide you need a severity; so you change the signature to

void Log(string message, LoggingSeverity severity) { .. }

Now, how long does it take to find all the calls to the Log() function that need to be updated? Under C#, about ten seconds. Once every call has been fixed, the code is almost certain to work correctly.

Consider, on the other hand, the python function

def log(message):

What happens if you change the signature to

def log(message, severity):

There is no way to tell where the log message is called. You’ve just introduced 200 bugs.

It’s made even worse by duck typing; maybe you have two loggers — a deployment logger which writes to a database, and a test logger which writes to stdout. You update the database logger so it has severity. Your tests continue to pass, but your deployed system will fail.

So it seems to me that static languages give you much more power to make changes to large codebases. I’d love to know if, and where, the mistakes are in my thinking.