I’ve been having vague and uneducated thoughts about generating prose, as a kind of prompt to help blocked writers. At the minute, I’ve got this surreal prose generator that uses word lists and sentence templates. It generates stuff like this;
“A tank of screaming tongues gaped its wicked, scorched fire, and strove it. As it tore and spat, it was jumping nothing, in front of the tooth of the male rag. a planet had no hole, despite the priest, scorched and silence, despaired. a cathedral of holy flowers prayed its pure, blessed samurai, and puked it. As it rued and dishonoured, it was gaping something, close to a destiny of the worst lust. It despised into the delight on deep nails from the mongol queen.”
Which is fun enough, I guess. What I was wondering, though, was whether I could improve the quality of the output with spam filtering. So, as people use the generator, they can mark lines as good or bad, and it learns, filtering out the bad lines.
Here’s how I thought it might work. The user chooses more sensible lines – ones with a thematic consistency. Eg, the line “a cathedral of holy flowers prayed its pure, blessed samurai, and puked it.” because it contains cathedral, holy, and blessed, and promoting phrases line “screaming tongues”, “scorched fire”, but demoting things with no thematic link.
So I was wondering – anyone know anything about Bayesian filtering, and whether it can be applied to groups of words rather than single ones? Or, maybe, anyone know of a linked dictionary of some kind, which tells you that ‘cathedral’ is close to ‘church’ but distant from ‘sausage’?
Or, in fact, any other ideas at all about the subject.
Usual steve-at-steve-cooper-dot-org email address, if you wanna send me something by email.
3 responses to “Random Prose Generation”
Sounds like you need a thesaurus
Ages ago someone (I think it was ) posted a link on Ebor to the visual thesaurus. Try giving it an association-rich word like “beautiful” to put it through its paces (you only get a few goes for free). Presumably the associative engine behind this thing or similar could do what you’re looking for.
Re: Sounds like you need a thesaurus
S’pretty close, yeah.
You made me think, actually, about getting a canon of text, analysing distances between words, and using that as a measure of similarity. So a religious text might have ‘church’ and ‘priest’ near each other, and so that would favour sentences that contained both…
Know of a good canon to use? And whether I can get it over the web?
Bayesian stuff… yes you can apply it to groups of words. Any “token” will do for bayesian analysis, word tuples or triples wouldn’t be hard to do.
Best would be..
User chooses the sensible lines, from which you form word tuples (you can even form tokens using concepts, so the sentence structure can be rated too) which the user can then rate. Allowing them to associate cathedral, holy, flowers, pure and blessed together highly, and samurai with those less highly, and puked not at all.
Any non-selected lines can have all tuples generated and rated at a 0.4 or something.
I can let you have the code that I wrote for my email Bayesian spam filter if you like (it only works on single tokens, not tuples though)