The Euclidean Distance in Diphthongs – R Graph and Code

Representing and plotting a distance in F1/F2 graph, in terms of the Euclidean distance, is relatively easy in R. This post shows one of the ways of achieving that. First, we provide a sample data, which consists of F1 and F2 values for two diphthong targets.  Then, draw the diphthong positions with their starting and ending targets, and, finally, calculate the distance. This R code does most of the F1/F2 calculations and drawing.

Data sample (Formants in a Diphthong)

First, a data sample.

    ascii   ipa     f1      f2
1  aw_l_1 ɑʊl_1 900.96 1600.10
2  aw_l_2 ɑʊl_2 373.61 1082.59 

    ascii   ipa     f1      f2
1  aw_l_1 ɑʊl_1 823.07 1542.39
2  aw_l_2 ɑʊl_2 411.39 1405.78

These are the values for the first two formants in /ɑʊ/, as measured in a group of 15 female ESL student and one RP speaker (also female). Number 1 in the notation marks the first vowel target, 2 the second (thus, aw_l_1 is /ɑ/ and aw_l_2 is /ʊ/), while the “l” marks a long diphthong. The two targets will be the starting and the ending of a line, and the line’s length is expressed by the Euclidean distance.

The Euclidean Distance

Euclidean distance is a metric distance from point A to point B in a Cartesian system, and it is derived from the Pythagorean Theorem. Thus, if a point p has the coordinates (p1, p2) and the point q = (q1, q2), the distance between them is calculated using this formula:

distance <- sqrt((x1-x2)^2+(y1-y2)^2)

Our Cartesian coordinate system is defined by F2 and F1 axes (where F1 is y-axis), and the metric distance refers to the distance from one diphthong target to another. The vowel targets, corresponding to A and B points are defined by the F1/F2 values in Hertz for a particular vowel. In our example above, A and B  are rows 1 and 2, while the values are F2 and F1 frequencies.

Plotting in R

The third step in the process is plotting, so we could see the graphical representation of the distance. We can do that by:

  1. Drawing the F1/F2 “coordinate system”.
  2. Drawing the vowels in A and B positions, and connecting them with a line.
  3. Drawing the arrows showing the direction of pronunciation and placing the IPA symbols.
An example looks like this:
Diphthongs drawn on F1/F2 plot
The English diphthongs as pronounced by the ESL students and a native RP speaker.
The diphthong  /ɑʊ/ is plotted in the lower right corner of the graph. Here are the Euclidean distances for that diphthong (in both variants):
          RPSpeaker  ESLStudents
aw_l ɑʊl  738.86     433.75
aw_s ɑʊs  816.08     471.60
The R code used to plot the graph can be found here.

IPA Symbols in R

This post is an example of how to place IPA (International Phonetic Alphabet) in R charts. I have achieved that by using the hexadecimal values of the corresponding Unicode symbols. There may be a more direct approach, but I am unaware of one.

A plot is created as usual, but the IPA labels are stored in a separate vector:

diph.names.ipa <- c('e\u026A', 'a\u026A', '\u0254\u026A')

The hex values of IPA symbols are available here.

A sample graph created with this R script looks like this:

A sample graph showing IPA symbols drawn by plot() comand.
A sample graph showing IPA symbols drawn by plot() command.


If you are working with R in ESS, there is a difference in IPA representation on Windows and Linux. In Windows the characters are shown in the hex notation, at least in my case. On Linux, on the other hand, the symbols are shown as IPA, so it is much easier to work:

Screenshot of IPA in ESS on Linux
IPA symbols within a data frame object in R (ESS/Linux)

The table above is sorted and ready to be inserted into a text editor. In case you are using Word or Writer, you can copy/paste the table with a quick workaround. You need to have installed Open Office (Libre Office). Open Calc application, select the first cell and paste the table from Emacs. In options that appear, select “Space” and “Merge delimiter” in “Separated by” and confirm. Next step is to copy the table from Calc and paste it where needed:

Vowel F1 F2 F3
ɑ 891.89 1656.59 2564.01
a 700.65 1389.3 2871.73
ɛ 585.82 1909 2713.09
e 532.55 2197.79 2714.36
ɔ 493.94 1270.26 2604.23
ʊ 383.08 1240.57 2610.09
ɪ 383.48 2308.99 2719.21
ə 480.32 1680.69 2652.19


FONRYE English Dictionary: Phonetic and Syllable Search

It’s not always possible to find a good searchable phonetic dictionary. That is why I created a free and open source program that searches phonetically transcribed words and filters the results against some basic rules. It uses BEEP and Moby Hyphenator II sources.

Download: FONRYE 0.3.3 (2.2 MB)

In this post: why phonetic dictionary search, what is FONRYE, download and search, settings and results, credits.

Why I needed a searchable phonetic dictionary

For several month I have been working on my M.A. in experimental phonetics. One of the prerequisites is an acceptable corpus. My work is about the English diphthongs. However, diphthongs have to be pronounced after voiced plosives and before voiced/unvoiced plosives, and the words containing diphthongs should preferably be monosyllabic.

Making a corpus is not an easy task and it involves a painstaking search for suitable material. I had no searchable phonetic dictionary of any sort (a version of Macmillan Advanced Dictionary refused to work). It was a pure luck, then, to come across a paper where the bibliography listed one interesting source:  University of Cambridge public FTP server. That is where I found BEEP and MH2 and decided to compile my own searchable dictionary, hopefully usable for the making the corpus.

What FONRYE is, and what it is not

FONRYE (named after fonetski rječnik in Serbian) is a very simple program (or script, if you like) written in Python 2.6. It is a specific piece of software I created for personal use: to search for diphthongs in a phonetic context. It does not have any fancy search rules or regular expression syntax. The plan was to use regexp, but it was very slow to run – I guess it can be improved if needed. So, please bear in mind that it was not planned for releasing: the code may contain strange comments, bad spelling etc.

Its settings are contained in the script itself, in 4 lines of code, which will be explained later. Here’s an example:

before = ('m', 'n', 'r', 'l' ),
after = sounds['voiceless'] + sounds['voiced'],
diphthongs = sounds['diphthongs'],
syllable = 0

The user enters desired search conditions, executes the program, which then saves the results in a folder, accompanied by a short info.

FONRYE phonetic dictonary search
FONRYE phonetic dictonary search, code view

How to use FONRYE

  1. On Windows/Mac: Download Python, but a version lower than 3.0. The version 2.6 is preferred. On Linux: You already have Python installed, but make sure you have an “old” version as well (again, prior to the version 3).
  2. Download FONRYE files, and unpack them. Please make sure you do not delete ‘results’ folder or the program will not work. On Windows go to Start menu and find IDLE inside Python folder. On Linux: Use any plain text editor which supports code editing, such as gedit. Or, install Python IDLE from your OS repository. Edit file, enter your settings and save the file. Finally, run the program (double click or press F5 in IDLE).
  3. The program will start the search, and after it finishes the results will be in results/fonyre_results_n, where n is the search counter.

Settings and results format

In the step 3 above you opened file. Here is how to enter the “settings”. First, locate these lines:

before = (),
after = (),
diphthongs = ()
syllable = 0

Do not modify anything except content inside the brackets and syllable number (that is, unless you are familiar with programming). By the way, syllable = 0 means words with 1 syllable, syllable = 1 with 2 syllables etc. Enter your phonemes in the brackets. For example, the settings:

before = ('b', 'd'),
after = ('p', 't'),
diphthongs = ('ay',)
syllable = 0

…will search for all words containing diphthong ay (IPA: aɪ) if the diphthong is between b/d and p/t. After the search is done, go to ‘results/fonrye_search’ folder and locate search_info.txt (here is a vowel search info, a sample) – that is info about your search, including unique mark (ID) placed in all result files to keep track of the searches/results. The folder ‘files’ is where your searches are placed. For the provided sample search the program produced the following file/results:

# uniqueid-dzzic
BIGHT		b ay t
BITE		b ay t
BLIGHT		b l ay t
BRIGHT		b r ay t
BRIGHTS		b r ay t s
BY-PASS		b ay p aa s
DIGHT		d ay t

You can find computer phonetic code in phoncode.txt in ‘data’ directory or in the file beep-1.0-edited. Please enter only this plain phonetic notation. IPA is not applicable.


I could create and use this little project, and place it on the Net, thanks to two people who provided the core of the project: a phonetic dictionary and a hyphenation dictionary. The phonetic dictionary was compiled by Toby Robinson from Cambridge University Engineering Department;  Moby Hyphenation dictionary was created by Grady Ward. Both the projects were placed into the public domain in 1996. See Bibliography page for FTP addresses.

My credits are for some fast-writing not-so-good-looking slow Python code, which you are free to improve and share.

Feedback is very welcome!