Turkish Morphology
Morphology is the field of linguistics that studies the formation of the words. Morphological analysis is the process of analyzing the structure of words, determining the parts of words such as root words, prefixes and suffixes. All these parts are called morphemes.
For some languages such as English, the morphological analysis task is not very complex whereas it is much more complicated for some languages like Turkish, Finnish, Hungarian and Czech. The complicated morphological processes of those languages may end up multiple derivations and/or inflections by the suffixes agglutinated each other, just like the beeds on a string. Therefore, for those languages, morphological analysis is required to have a better understanding of the words, syntactical structure and semantics.
Likewise other agglutinative languages, Turkish poses various challenges due to its rich morphological structure. It has both derivational and inflectional suffixes and more than 20,000 valid wordforms can be formed from a single noun root. This property of Turkish results in very large vocabulary sizes and traditional vocabulary based methods that work well for other languages with relatively simpler morphological structures fail to achieve high performance.
For example, lets consider the following sentence:
Yeni gelen kitaplar, kitaplıklarımızdaki yerini aldı.
(Newly arrived books have taken their place in our bookcases.)
The morphological process during word formation of kitaplıklarımızdaki wordform is like that:
Wordform | Suffix | English Gloss |
kitap | book | |
kitaplık | -lık (derivational noun->noun) | bookcase |
kitaplıklar | -lar (plural) | bookcases |
kitaplıklarımız | -ımız (posessive) | our bookcases |
kitaplıklarımızda | -da (locative) | in our bookcases |
kitaplıklarımızdaki | -ki (derivational noun->adj) | (the one) in our bookcases |
Also, morphological analysis of a Turkish wordform usually produces multiple ambiguous results. An additional morphological disambiguation processor is required to pick the correct morphological structure according to the context in the sentence. Here is an example where the input wordform is elması:
- elma+Noun+A3sg+P3sg+Nom (his/her apple)
- elmas+Noun+A3sg+P3sg+Nom (his/her diamond)
- elmas+Noun+A3sg+Pnon+Acc (diamond [accusative])
Although some tasks like text classification may perform satisfactory with wordforms in Turkish, a proper morphological analysis is preferred for deeper analysis or to improve the performance of the NLP task.
Useful Resources
- Oflazer, K., 1994. Two-level description of Turkish morphology. Literary and linguistic computing, 9(2), pp.137-148.
- Zemberek-NLP: Natural Language Processing tools for Turkish
- Sak, H., Güngör, T. and Saraçlar, M., 2011. Resources for Turkish morphological processing. Language resources and evaluation, 45, pp.249-261.