Sitemap

Bard Bot

8 min readMay 22, 2021

--

I’ve been including synthesized vocals in my music recently. The last song I wrote was Hymn for the Polymorphic Goddess, and it incorporates some Google text-to-speech. I have no skill or interest in writing lyrics, so for this song I used the first few terms of a famous integer sequence. This fits the theme of a technological deity nicely, but won’t make sense for my other songs. I needed a new approach.

So I decided to write a program that would generate random poetry. It didn’t have to be any good, or make sense, just have the right rhythm. If it could at least generate something interesting, the user could use it as a starting point for their own stuff. The obvious solution would be to throw machine learning at the problem, but that’s no fun. So I grabbed some pronunciation and part-of-speech (PoS) data sets, and set to work.

Grammar

The first step was to just generate random sentences. The simplest way to do this using the data I had was to generate a grammatically valid string of PoS, then select random words from the PoS data set to match. Unfortunately, asking a linguist for an EBNF representation of English grammar is similar to asking a web developer for a regex that can parse HTML. You’ll find a lot of detailed explanations of why it’s not possible, and a few people who attempt it anyway.

For my purposes though, a solution that only sort of works is good enough. So using some of the half baked EBNF I found online as inspiration, I made this drastically over-simplified model of English grammar:

S = NP, VP, [C, S]
NP = (D, {A}, N | {A}, p | r), [P, NP]
VP = [v] (V | t, NP)

(N=noun, p=plural noun, V=verb, t=transitive verb, v=adverb, A=adjective, C=conjunction, P=preposition, r=pronoun, D=definite article)

I hard coded some arbitrary random weights wherever there’s an option or repetition. Also, since the data sets are pretty huge, and include a lot of obscure words, I’m using a word frequency data set to restrict the results to the most common 30k words. Here are some examples it generated:

rPrvV: “Yourself minus yourself alternatively duplicating”
DANvV: “Either intimate outrage probably emerged”
ApvVCrtAp: “Motivational batteries saturdays proceed nevertheless whatever foil unpredictable tears”
pVCrtDaV: “Campuses happen therefore everybody exclude another cellular researcher”

These sentences are not very sensible, but they’re good enough for my bot. The main limitation is that the PoS data set is imperfect. It’s categories are pretty coarse, and many the words with multiple possible PoS include very obscure alternatives. I can’t use the frequency data set to restrict which PoS are allowed for a particular word, so it will often choose to use a common word in an uncommon way. Another issue is that the frequency data set includes common misspellings that happen to match uncommon words (for example ye or nigh).

The user can work around these issues by just generating more examples until they find some good ones, and adjusting pluralization and tense manually.

Counting syllables

The pronunciation data set looks like this:

disaster D IH0 Z AE1 S T ER0
disasters D IH0 Z AE1 S T ER0 Z
disastrous D IH0 Z AE1 S T R AH0 S
disastrously D IH0 Z AE1 S T R AH0 S L IY0

Each word is followed by a list of phoneme codes, some of which have a number after them. The numbers indicate where the syllables are, and what the emphasis of the syllable is. There are 3 levels of emphasis, none (0), primary (1) and secondary (2). So we can calculate the number of syllables for each word by just counting how many emphasis numbers there are.

Adding the constraint that the sentence must have 7 syllables, we get these results:

“Myself totally produce”
“Polished bomb baths lastly stay”
“Other terry weigh oneself”
“Nothing proudly reveal dues”
“She accidentally bray”

These work ok, but we can see that the rhythm of the phrases varies quite a lot (especially in the last example), even if the number of syllables matches.

So far I haven’t talked about the implementation details of any of the algorithms, because it’s all been very straightforward. But the next step is where things start to get interesting.

Cadence matching

Rather than just matching the number of syllables, we need to match the cadence of the sentence. The cadence is just the string of emphases from each syllable. So for the “disaster” example above, the cadence would be “010”. To do this efficiently, we need some sort of dictionary of emphasis. We want to be able to quickly look up a list of words with a particular cadence for some PoS. The outer layer of our dictionary is just a map from PoS to the cadence dictionary. The cadence dictionary is the juicy bit.

If we just wanted to find exact matches for a cadence, we could just make another map using the cadence string (eg “010”) as the key. But that’s too restrictive. Being that precise would make the cadence sound stilted, and also make it very difficult for the bot to find matching phrases. The user should be able to specify stuff like they want a 7 syllable phrase, with strong emphasis on the first and fifth syllables, but that the emphasis on the other syllables can be anything.

My solution is to store the words in a tree. Each node of the tree represents the emphasis of a syllable, has up to 3 children (one for each possible emphasis of the next syllable), and has a list of words that match the cadence of the path to that node.

A cadence tree

We can find inexact matches using this tree, by walking along it and maintaining a list of matches. For example, if we’re looking for all the words with cadence 1?0 (primary emphasis, any emphasis, no emphasis):
1. Our match list starts with just the root in it, [root]
2. Then we walk along the “1” edge, [1]
3. Next, we walk along all the outgoing edges, [1→0, 1→1, 1→2]
4. Finally, we walk along the “0”, [1→0→0, 1→1→0, 1→2→0]

So at each step, for each node in the current match set, we walk along each matching outgoing edge. Once we have our set of matching nodes, we select a random node from the set, and a random word from that node.

Tying it all together, we generate a string of PoS using our algorithm from earlier, and make sure it’s the same length or shorter than our cadence specification. Then we randomly assign a number of syllables to each word in the PoS string, such that the total number of syllables matches the length of the cadence spec. Next, we iterate over the PoS string, look up the cadence dictionary for that PoS, then select a random word from it, which matches the corresponding slice of the cadence spec.

Here’s some sentences the bot produced using the cadence spec “10??101”:

“Anybody deeply wink”
“Buddies hitherto converged”
“Everybody not conform”
“Something strangely license wares”
“Goods with wits verbatim have”
“Dozen clay aboard destroy”
“Arts alike tow words with drops”

These results are much better. The rhythm is quite pleasant to read.

Rhyming

The final constraint to add is rhyming. Essentially, we just need to make sure the last few phonemes of each line match. Like before, to do this efficiently, we have to store the words in a suitable data structure.

Broadly speaking, the rhyme data structure is similar to the cadence tree. At each leaf of the cadence tree, we were storing a list of words with that cadence. Now, we need to replace that simple list with another tree. Each node of the rhyme tree represents one phoneme, and each node stores a list of words ending with the phonemes corresponding to the path to that node.

A rhyme tree

We can walk down the tree following the phonemes of our rhyme (in reverse), and choose a random word from the list at the final node.

Now obviously we don’t want to store the full list of words at each node, because this would lead to a lot of duplication (each of the words in the example above appears in multiple nodes). So instead we store the words for the whole rhyme tree in a single list, and store offsets into the list on each node. If we store the words in DFS order, all the words matching any rhyme phonemes we can think of will occupy a contiguous region within the list.

A rhyme tree with a DFS order word list

So we follow the rhyme phonemes down the tree, and when we reach our target node, we look up its region within the word list, and select a random word from that range.

One important edge case to handle is when a word in the tree is shorter than the rhyme phonemes we’re trying to match. We need to allow the extra phonemes not covered by the short word to be covered by previous word in the sentence. This will matter if we want to do multi syllable rhyming.

We can handle this easily enough by storing an additional index on each node, to mark which words end at that node (for example, the [“at”, “cat”] node would have another index marking that “at” ends at this node). Then as we walk down the tree, we look up each node’s ending words, and include them in our set when selecting a random word. For example, if we were trying to rhyme with “copycat” using [K, AE, T], we would walk to the “cat” node, but our random set of words to choose from would include “at”, since it terminates at the [“at”, “cat”] node we passed through. That way, the K phoneme could be covered by the preceding word (eg “pack”).

There’s also the detail that we don’t want the bot to make a rhyme by just reusing the word, but this constraint is easy to add to the existing framework.

Putting it all together, the bot generates these rhymes:

Yourself overseas relied.
Everyone sincerely ride.

Seals anew emerging more.
Tears denying thou before.

Huge surroundings thus reside.
Nuts good youths around them slide.

Those just bike to thou was though.
Cans beyond her poke this toe.

We with whom evoke it lest.
Ours to crowned aid is addressed.

Something newly happen for.
Most move through sick gases roar.

Wrapping up

I made some final tweaks to allow for non-trivial rhyme schemes, and to let sentences span multiple lines. Then wrapped it in a little web UI, so you can try it yourself: https://liamappelbe.github.io/bard

About 20% of the stanzas it generates are good enough for my original goal of putting synthesized lyrics in my music, so I just generate a bunch and pick the best. Here are some examples using an ABAB rhyme scheme, and an alternating 8–7–8–7 syllable cadence:

Every objective caffeine
commonly succeed before.
Any urban feeling between
yourself out declining nor.

Anybody nestle unless
his in try posh cords whereas.
Nice salons adversely express
some good lame welsh robin as.

Least for convey hot good tattoos.
Thy will afternoons abstain.
Much good high born nicely excuse.
Everybody there refrain.

Nothing barring some secure bed
thus convinced hers under thou.
Companies considering dead
deputies anew meow.

--

--

Liam Appelbe
Liam Appelbe

Written by Liam Appelbe

Code monkey, board game hoarder, aspirant skeptic

No responses yet