Tag Archives: synthetic language

Uyghur’s myriad verb forms produce over 42 million words

In a previous post I wrote about the latest candidate for longest word in the Uyghur language. In that post we saw that agglutinative languages like Uyghur can produce extremely long words. Another feature of such languages is the sheer number verbs forms that can be produced.  Mood, voice, aspect, tense person and number are all conveyed by suffixes which must be attached to the root verb in the appropriate order.

Recently Alim Ahat, founder of Uighursoft, wrote an article suggesting that Uyghur may have a store of some 50 million words when you include all the possible forms of every verb. Alim was working on a new edition of Uighursoft’s spell-checker when he took the opportunity to carry out some mathematical analysis of Uyghur verb inflections.

By his calculations, the total number of possible verb forms in modern literary Uyghur is 8,455. When he ran that by all available verbs in his software he came up with a store of an astonishing 42,613,200 synthetic words!

Alim says that no matter what form of what verb you write, his spell-checker will find it 99.99% of the time, and the total number of words and expressions stored in his software tool approaches 50 million.

In case you’re having trouble believing all this, he provides an example of what happens when you enter the verb root aptomatlash- (“to be automated”) into the software. With the press of a button it generates a list of 5040 different forms. Here are the first ten, and the 5040th:

1. aptomatlashmaq
2. aptomatlishish
3. aptomatliship
4. aptomatlashqan
5. aptomatlishidu
6. aptomatlashti
7. aptomatlishidighan
8. aptomatlishe
9. aptomatlishishqa
10. aptomtalishishi
5040. aptomatlashturulghudekla

Perhaps most astonishing of all is that as I look down the list I am yet to see a form that I do not recognize as a possible form of the word. In other words, a person who knows the language could (in theory) form and use any of these words. It seem the human brain is indeed “hard-wired” for language (Chomsky?) and has an amazing capacity to synthesize language according to the “rules”.