This posts brings R, Praat and Python code I used to write my Phonetics MA paper, as well as the paper itself to download, plus the acquired data. I won’t go into too many details about the downloads, but I will note that I hope they will be of some use to people searching for similar things, approaches – or simply, to see how useful free and open source software is to researchers.
R, Python, Praat Code
The R, Python, and Praat code is hosted on Github under the label r-diphthongs-sr-en (here is the zipped version, which may not be up to date, but again not too different). The software tool that that took me the most time to write was a set of scripts in R language. It was designed to load the data I acquired with Praat and to list tables and create plots (the R plots and diphthongs you can see here). The code takes length, pitch, formants and intensity of diphthongs as the input.
Data: Diphthong Measurements, RP Speaker versus ESL Speakers
In my research I compared the lengths, formants, intensity and pitch of the selected diphthongs, as pronounced by of a group of female ESL speakers (native language Serbian), with a referent RP speaker. The data (see it here or at the above links) was extracted by using Praat TextGrids (this is how I checked them), and if you’re interested to see which methods and techniques I used to segment the files, you can see this chapter (the link to the integral paper is below). The data linked contains 8 diphthongs in 2 contexts (short/long), as recorded and pronounced by 15 ESL speakers and 1 RP speaker. The diphthongs were pronounced within 32 words (I wrote this script to select the corpus).
MA Paper: “Pronunciation of English Diphthongs by Speakers of Serbian: Acoustic Characteristics”
The paper is titled “Pronunciation of English Diphthongs by Speakers of Serbian: Acoustic Characteristics” and the most current (but not error free) version you will find here: http://www.languagebits.com/files/ma-paper/
So, Why Putting All This Online?
The most of the code here is tailor-made for my research, and I am aware that it cannot as-is be used in some other project. However, I believe it is a very useful heap of ideas. For example, Praat scripts and TextGrigs show some advanced tips for data extraction and control, which are backed up by a phonetic discussion about segmentation (itself a demanding task). The Python is used for corpus search and integrates a script from NTLK Toolkit to verify the sound signal annotations (as well as for the control of recording, but about that some other time). Finally, R scripts show how custom-made project is limited only by imagination, and how simple operations and filtering can significantly contribute to the final result (what I’m saying here is: don’t use Excel, learn R).
I also firmly believe that data, especially scientific (even in a such humble work, as an MA paper is), should be free, and that ideas should be free. Moreover, I have in mind Ladefoged’s words from his Phonetic Data Analysis:
After you have written everything, I hope you will publish a complete account of the work, even of it is only on your web site. Private knowledge does the world no good. … In addition, make sure that your data is stored in such way that it can be found and used by others. (p 192)