About Language Bits

Here are some of my projects that are related to language and/or programming. I use various tools, and their selection depends on what I like/know at the moment and what I decide to learn next. My main programming language is Python, but you will find R below listed as well. Other technologies I’m familiar with (the extent of familiarity, of course, varies) and like to use are HTML, CSS, JSON, XML, Tk, Jinja2 templates, MongoDB, jQuery, JS, NGINX/Apache, Ubuntu… Most of the time I use Emacs as my default editor.

Just a hint: to see other things I wrote about, have a look at Sections/Recent.

Jotpub is a I website developed to assist creation of language/grammar tests. Users can create, solve, share, and export PDF version of the tests. The tests are stores as JSON documents, which are then parsed into HTML/DOM. The database type in use is MongoDB. It is developed in Python language, with CyherryPy as a framework and Jinja2 templates. It relays on pure JavaScript (a note to myself: don’t do this — please use a framework).  The server that runs it all is NGINX. It took me several months of work to create a usable version.

R-diphthongs-sr-en is a code collection of tools I developed for my MA thesis. If takes raw data parsed by an automated Praat instance, which is then fed to R code. Next, the code  parses the numbers (formants, pitch, intensity), making tables and graphs. The main programming language here is R (the R stands for The R Project for Statistical Computing), while Python is used for verification of Praat TextGrids. You can find more about these tools here, and my MA paper in experimental phonetics is here.

Srmorph is a two-part project. The first is something I call Pythonic experiments in Serbian morphology. Basically, the Python objects hold all affix data, which is then used for analysis. I also use srmorph to create a text corpus from various text sources. The second part of the project is the site which represents the interface to the data and analysis. The tools used are Python, jQuery, CherryPy. This is the project I usually work on nowadays, and more about it you can find here.

Dtknv is a small tool to convert Cyrillic DOCX/OTD (MS Word, Open Document) documents into its Latin script versions. The tool can be used for batch conversion. Dtknv uses “blunt” transliteration where XML content is unpacked into RAM and then converted on character level. The vice versa conversion is available for plain text files only. It has on-the-fly search and replace feature where strings can be organised and exported. It is created by using Python, with Tkinter/Tcl/Tk as the interface — which means it runs on many platforms. The code is here and the page in Serbian is also available.

pyLatinam was one of my first projects in Python. It deals with declension and conjugation of Latin words.

FONRYE is a nice example in Python of how phonetic and hyphenation dictionaries can be joined to make a tool that searches for matches that meet criteria for a creation of recorded phonetic data.

SNTRecorder is a tool in Python/Tk that I wrote to assist students in the recording of the sound data. It allowed them to speak in their own pace, while maintaining a uniform overall speed.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>