Turkish.AI Blog

Turkish Morphology

24.06.2023 By Ahmet Cüneyd Tantuğ Turkish, Morphology

Morphology is the field of linguistics that studies the formation of the words. Morphological analysis is the process of analyzing the structure of words, determining the parts of words such as root words, prefixes and suffixes. All these parts are called morphemes.

For some languages such as English, the morphological analysis task is not very complex whereas it is much more complicated for some languages like Turkish, Finnish, Hungarian and Czech. The complicated morphological processes of those languages may end up multiple derivations and/or inflections by the suffixes agglutinated each other, just like the beeds on a string. Therefore, for those languages, morphological analysis is required to have a better understanding of the words, syntactical structure and semantics.

Likewise other agglutinative languages, Turkish poses various challenges due to its rich morphological structure. It has both derivational and inflectional suffixes and more than 20,000 valid wordforms can be formed from a single noun root. This property of Turkish results in very large vocabulary sizes and traditional vocabulary based methods that work well for other languages with relatively simpler morphological structures fail to achieve high performance.

For example, lets consider the following sentence:

Yeni gelen kitaplar, kitaplıklarımızdaki yerini aldı.
(Newly arrived books have taken their place in our bookcases.)

The morphological process during word formation of kitaplıklarımızdaki wordform is like that:

Wordform	Suffix	English Gloss
kitap		book
kitaplık	-lık (derivational noun->noun)	bookcase
kitaplıklar	-lar (plural)	bookcases
kitaplıklarımız	-ımız (posessive)	our bookcases
kitaplıklarımızda	-da (locative)	in our bookcases
kitaplıklarımızdaki	-ki (derivational noun->adj)	(the one) in our bookcases

Also, morphological analysis of a Turkish wordform usually produces multiple ambiguous results. An additional morphological disambiguation processor is required to pick the correct morphological structure according to the context in the sentence. Here is an example where the input wordform is elması:

elma+Noun+A3sg+P3sg+Nom (his/her apple)
elmas+Noun+A3sg+P3sg+Nom (his/her diamond)
elmas+Noun+A3sg+Pnon+Acc (diamond [accusative])

Although some tasks like text classification may perform satisfactory with wordforms in Turkish, a proper morphological analysis is preferred for deeper analysis or to improve the performance of the NLP task.

Useful Resources

About Author

Ahmet Cüneyd TANTUĞ

Cüneyd is a computer engineer working in Machine Learning and Natural Language Processing, primarily on Turkish, for more than 20 years. He holds a PhD degree on NLP.

Privacy Preferences

Registered Users

New Customers

Turkish Morphology

Useful Resources

About Author