SNTRecorder – A tool to assist speakers in corpus recording

The purpose of SNTRecorder is to assist in corpus recording. It was written to make speakers feel easier during the recording process, by allowing them to choose they own pace in pronunciation, but again not allowing sentences to be read too fast. Written in Python, with interface created in Tkinter (Tcl/Tk), it is a multiplatform tool, executable on Linux and Windows. Download SNTRecorder (source code, 48KB).

SNTRecorder
Compiled sentence where user is allowed to read it aloud

Assistance during a recording session

SNTRecorder is not a recording software, just a small utility that assists in the process.  The best way to understand what this program do read the workflow description (taken from my MA paper about phonetics):

  1. The project is loaded and the program creates the sentences and the time list, and then  shuffles them.
  2. A user sits in front of the screen and enters the initials.
  3. The program shows a sentence and a red line in the lower part of the sentence screen. The sentence is crossed out at this step, as a signal to the speaker to read the sentence without saying it aloud.
  4. The randomly selected time for the current sentence elapses and the red line changes to green, and the text appears normal.
  5. The speaker pronounces the sentence and presses the “next” key to continue.
  6. Steps 3 through 5 repeat until all sentences are recorded.
  7. The program informs that the current session is over and asks if there is need to repeat some items. If answered yes, a window is shown to select the sentences and to repeat  steps 3-6 for a given selection. The session ends once there is no re-recording.
  8. A new user is ready and the cycle restarts.
SNTRecorder
Compiled sentence where user must wait to pronounce it

Speaker logs

The program creates a simple log for each speaker. Here is a sample of a log.

# Started at: 2011-03-11-10:42
# Ended at: 2011-03-11-10:48
# Speaker: ana
# Random time: 1-4
# Time scale: 10

Sentence t(1/SCALE s) Next
The word "theirs" is spoken. 19.6834339953 10:44:07:607000
The word "abjured" is spoken. 22.7161564864 10:44:16:982000
The word "bait" is spoken. 26.945639852 10:44:23:597000
The word "dare" is spoken. 19.9025319695 10:44:34:642000
The word "fierce" is spoken. 16.5143768762 10:44:42:457000
The word "douse" is spoken. 19.3227666274 10:44:48:136000
The word "bourse" is spoken. 34.6668924964 10:44:54:766000
The word "fears" is spoken. 21.7415600234 10:45:02:441000
The word "Job" is spoken. 18.9044580715 10:45:10:100000

The log is preceded by time and speaker information. Random time indicates the brackets for selecting minimum and maximum values the user will be prevented to move to the next sentence. The log lines consist of the sentence that was shown on the screen,  randomly selected time to prevent showing the next sentence, and finally the time when speaker pressed the “next” key.

Project settings, execution and OS difference

sntrecorder main window
Main window of the program

To create your own project, just make a copy of sampleproject.py and leave it in the same directory. Then, edit the new file. The comments explain how to change the template sentence and insert new words. The file name (without extension) is you project name in the program. To change min and max time for users to wait between each sentence, edit TIMEMIN and TIMEMAX in project.py file.

To execute the program, go into src folder and type python3 sntrecorder.py. If your’re using Windows you’ll have to provide the full path to the file, usually something like C:\python32\python,exe, but it depends on Python version you are using (must be 3+).

Sound will not work on Linux (clicks for start and end of a sentence, which I doubt you might need). Also, you will need python3-tk library.

Program uses distinct strings for on-screen messages that are editable in language.py, and you can provide localized versions. An example, in Serbian, is already there.

As you can see, this  is a small tool that I created to make the recording easier for students. Because this was just one of the tools I developed for my thesis, it is not user-friendly in terms of rich interface settings. But, thanks to open source and portability, you are free to adopt it to your requirements, with only a text editor and some patience. I hope you will find it useful.

Download SNTRecorder (source code, 48KB).

srmorph: Serbian Morphology in Python

My interest in linguistics and programming is continued with an experiment in morphology and srmorph project. It is a pilot endeavour I use to test ideas about parsing words of my native language (Serbian) on word level, and later, syntactic level. This post is about the work in progress.

What Can Be Seen, Searched, Parsed?

The project for time being has only Web/AJAX interface at http://srmorph.languagebits.com/ which allows:

Affixes as Basics

At the foundation of srmorph are Serbian affixes. I always wanted to write a parser that would work by first examining words on the level of prefixes an suffixes (infixes are somewhat tougher problem). Therefore, the analysis is for now based on identifying affixes.

Environment and Data Format

The environment is Python 3 programming language, while the grammar data format is based around Python classes themselves. The uninstantiated classes are the actual data containers, and after they inherit from main meta classes, the become useful for parsing. For example, a class containing suffixes about declension looks like this:

class AffNounDeclension0(MAffix):
    """Suffix. Example: 'доктор'. Ref. Klajn:51."""
    pos = 'MNoun'
    place = 'end'
    process = ('inflection', 'declension')
    subtype = 1
    gender = 'm'
    suffix = {0:'', 1:'а', 2:'у', 3:'а', 4:'е', 5:'ом', 6:'у'}
    blendswith = ('nonpalatal',)

The attribute suffix lists seven endings glued to some masculine nouns in Serbian (Croatian, Bosnian). POS identifies word class, here a noun, etc.

Parsing and Website

The inherited Serbian affix classes (60+) are so far parsed functionally. I have set up a dynamic website at http://srmorph.languagebits.com/ which shows some of the things that can be done by parsing. For now the algorithm is rather straightforward, until further filtering is introduced on word class level.

Once reasonably developed, the project will become open source.

screenshot: all classes where suffix "na"
Details about affix “na” in Serbian

Phonetics R, Praat code in GPL3, Paper and Data to Download

This posts brings R, Praat and Python code I used to write my Phonetics MA paper, as well as the paper itself to download, plus the acquired data. I won’t go into too many details about the downloads, but I will note that I hope they will be of some use to people searching for similar things, approaches – or simply, to see how useful free and open source software is to researchers.

R, Python, Praat Code

The R, Python, and Praat code is hosted on Github under the label r-diphthongs-sr-en (here is the zipped version, which may not be up to date, but again not too different). The software tool that that took me the most time to write was a set of scripts in R language. It was designed to load the data I acquired with Praat and to list tables and create plots (the R plots and diphthongs you can see here). The code takes length, pitch, formants and intensity of diphthongs as the input.

Data: Diphthong Measurements, RP Speaker versus ESL Speakers

Praat TextGrid, drawn below waveform
A TextGrids with segments for the word/diphthong length, and the referent points in the constituent vowels for data measurement.

In my research I compared the lengths, formants, intensity and pitch of the selected diphthongs, as pronounced by of a group of female ESL speakers (native language Serbian), with a referent RP speaker. The data (see it here or at the above links) was extracted by using Praat TextGrids (this is how I checked them), and if you’re interested to see which methods and techniques I used to segment the files, you can see this chapter (the link to the integral paper is below). The data linked contains 8 diphthongs in 2 contexts (short/long), as recorded and pronounced by 15 ESL speakers and 1 RP speaker. The diphthongs were pronounced within 32 words (I wrote this script to select the corpus).

MA Paper: “Pronunciation of English Diphthongs by Speakers of Serbian: Acoustic Characteristics”

The paper is titled “Pronunciation of English Diphthongs by Speakers of Serbian: Acoustic Characteristics” and the most current (but not error free) version you will find here: http://www.languagebits.com/files/ma-paper/

So, Why Putting All This Online?

The most of the code here is tailor-made for my research, and I am aware that it cannot as-is be used in some other project. However, I believe it is a very useful heap of ideas. For example, Praat scripts and TextGrigs show some advanced tips for data extraction and control, which are backed up by a phonetic discussion about segmentation (itself a demanding task). The Python is used for corpus search and integrates a script from NTLK Toolkit to verify the sound signal annotations (as well as for the control of recording, but about that some other time). Finally, R scripts show how custom-made project is limited only by imagination, and how simple operations and filtering can significantly contribute to the final result (what I’m saying here is: don’t use Excel, learn R).

I also firmly believe that data, especially scientific (even in a such humble work, as an MA paper is), should be free, and that ideas should be free. Moreover, I have in mind Ladefoged’s words from his Phonetic Data Analysis:

After you have written everything, I hope you will publish a complete account of the work, even of it is only on your web site. Private knowledge does the world no good. … In addition, make sure that your data is stored in such way that it can be found and used by others. (p 192)

Cheers!