I’ve been having vague and uneducated thoughts about generating prose, as a kind of prompt to help blocked writers. At the minute, I’ve got this surreal prose generator that uses word lists and sentence templates. It generates stuff like this;
“A tank of screaming tongues gaped its wicked, scorched fire, and strove it. As it tore and spat, it was jumping nothing, in front of the tooth of the male rag. a planet had no hole, despite the priest, scorched and silence, despaired. a cathedral of holy flowers prayed its pure, blessed samurai, and puked it. As it rued and dishonoured, it was gaping something, close to a destiny of the worst lust. It despised into the delight on deep nails from the mongol queen.”
Which is fun enough, I guess. What I was wondering, though, was whether I could improve the quality of the output with spam filtering. So, as people use the generator, they can mark lines as good or bad, and it learns, filtering out the bad lines.
Here’s how I thought it might work. The user chooses more sensible lines – ones with a thematic consistency. Eg, the line “a cathedral of holy flowers prayed its pure, blessed samurai, and puked it.” because it contains cathedral, holy, and blessed, and promoting phrases line “screaming tongues”, “scorched fire”, but demoting things with no thematic link.
So I was wondering – anyone know anything about Bayesian filtering, and whether it can be applied to groups of words rather than single ones? Or, maybe, anyone know of a linked dictionary of some kind, which tells you that ‘cathedral’ is close to ‘church’ but distant from ‘sausage’?
Or, in fact, any other ideas at all about the subject.
Usual steve-at-steve-cooper-dot-org email address, if you wanna send me something by email.