srmorph: Serbian Morphology in Python

My interest in linguistics and programming is continued with an experiment in morphology and srmorph project. It is a pilot endeavour I use to test ideas about parsing words of my native language (Serbian) on word level, and later, syntactic level. This post is about the work in progress.

What Can Be Seen, Searched, Parsed?

The project for time being has only Web/AJAX interface at which allows:

Affixes as Basics

At the foundation of srmorph are Serbian affixes. I always wanted to write a parser that would work by first examining words on the level of prefixes an suffixes (infixes are somewhat tougher problem). Therefore, the analysis is for now based on identifying affixes.

Environment and Data Format

The environment is Python 3 programming language, while the grammar data format is based around Python classes themselves. The uninstantiated classes are the actual data containers, and after they inherit from main meta classes, the become useful for parsing. For example, a class containing suffixes about declension looks like this:

class AffNounDeclension0(MAffix):
    """Suffix. Example: 'доктор'. Ref. Klajn:51."""
    pos = 'MNoun'
    place = 'end'
    process = ('inflection', 'declension')
    subtype = 1
    gender = 'm'
    suffix = {0:'', 1:'а', 2:'у', 3:'а', 4:'е', 5:'ом', 6:'у'}
    blendswith = ('nonpalatal',)

The attribute suffix lists seven endings glued to some masculine nouns in Serbian (Croatian, Bosnian). POS identifies word class, here a noun, etc.

Parsing and Website

The inherited Serbian affix classes (60+) are so far parsed functionally. I have set up a dynamic website at which shows some of the things that can be done by parsing. For now the algorithm is rather straightforward, until further filtering is introduced on word class level.

Once reasonably developed, the project will become open source.

screenshot: all classes where suffix "na"
Details about affix “na” in Serbian

Published by

Romeo Mlinar Author

Leave a Reply