Resonant frequencies and the vocal tract length

This post is about resonant frequencies of a tube, in the context of speech and the neutral vocal configuration. Two formulas are given: the first to calculate the resonant frequencies when the length is known, and second, to calculate the length when the frequency of a formant is known. Finally, there is a real-life example: a calculation of a speaker’s vocal tract length after measuring the formants in schwa.

The speech mechanism in vowels is described by a model that uses the physical properties of tubes. A tube is a simple apparatus that, if attached to a source of sound, can emit harmonic frequencies. When attached to a sound speaker at the end, the tube acts as a resonator that “has an infinite number of resonances, located at frequencies given by odd-quarter wavelength” (Kent and Read 14). The resonant frequencies of a tube closed at one end are calculated by using this formula (Johnson 96):

 Fn=\frac{(2n-1)c}{4L}

Where n is an integer, L is the length of the tube and c is the speed of sound (about 35,000 cm/sec).

This was very interesting to me, so I decided to experiment with the formula in R language. The purpose was to calculate average frequencies of a vocal tract in the neutral configuration (a position of vocal organs where a tube without obstacles is created from the larynx to the lips). So, the formula written above in R looks like this:

freq <- ((2*i-1)*35000)/(4*tract.len)

For a given speed of sound c=35000, the formant number i and the tract length, we can calculate estimated formant values. As an example, we can insert L = 17.5 cm in the formula, the average length of human tract16 from glottis to lips (15). In this case the first formant, or the first resonance frequency, occurs at 500 Hz, the second at 1500 Hz, the third at 2500 Hz, and so on. Here is the output form R code located here:

> Resonance(17.5)
Tract length is 17.5 cm.
formant 1: 500 Hz
formant 2: 1500 Hz
formant 3: 2500 Hz
formant 4: 3500 Hz
formant 5: 4500 Hz

Of course, we can reverse the calculation; by entering formant frequency and the order of the formant we can calculate an average length:

prep <- 35000*((formant/2)-0.25)
length <- (prep/freq)

This is the result of  Length function of the code:

> Length(1000, 1)
Estimated tract length is 8.75 cm, where formant number 1 has value of 1000 Hz.

This length corresponds to vocal tract lengths measured in infants.

spectrogram and waveform
A spectrogram and waveform near the end of a word "abjured". The three red lines show formants, while the vertical line shows the measurement point. Analysed in Praat.

To make the calculations even more interesting, we can measure the frequency of the first formant of speakers, and then “calculate” the length lengths of the vocal tracts. Here is an example: we recorded a speaker and examined the sound data. Since schwa sound is pronounces in (approximately) the neutral configuration, we measured the formants where this sound (IPA: ə) was articulated. In this case, that was near the end of the word  abjured /əbˈdʒʊəd/. The first three formant values in the sample female speaker were:

Time_s   F1_Hz   F2_Hz   F3_Hz
4.633178   549.304326   1750.098455   2915.885791

If we enter 549.3 Hz in the second formula, we get:

> Length(549.304326,1)
Estimated tract length is 15.92 cm, where formant number 1 has value of 549.3043 Hz.

This is, it seems, an acceptable value for this speaker.

The measurements and image was obtained by using Praat, free phonetic software. Calculation and the code example were written in R programming language.

Sound (Related to Speech)

Sound is a form of energy (Crystal 32). It is a series of pressure fluctuations in a medium (Johnson 4). In speech the medium is usually air, although sound can propagate through solid objects and water, for example. Once the air particles become energised by the vocal folds vibration, a series of rarefaction and compression events begin. Compression occurs when particles are shifted closer to each other, which results in increased density within medium. Rarefaction is the opposite, when particles retract so density in medium reduces.

Compression, rarefaction, and other terms related to acoustics are often explained through a simple device – a pendulum. A pendulum, or a swing, is “a weight hung from a fixed point so that it can swing freely” (Oxford Dictionary). Once set in motion it will oscillate between two maximum points and its central, equilibrium, position.

A simple pendulum with minimum, maximum and equilibrium points

Here is a graphical representation of a pendulum. The point E is the equilibrium, while the points M1 and M2 mark the maximum points on both sides of the pendulum. The swinging motion from E to M1, then back to E and up to M2, can be shown in the coordinate system as a sinusoid. The figure shows such a sinusoid, with a series of maximum and minimum swinging points. The crossing point of the sinusoid and the line show the phase in oscillation when the pendulum reaches its starting point E. Particles do not travel through a medium; instead, they create a propagating pressure fluctuation: “A sound wave is a travelling pressure fluctuation that propagates through any medium that is elastic enough to allow molecules to could together and move apart” (Johnson 3). In other words, while each particle moves back and forth and acts “like the bob of pendulum … the waves of compression move steadily outward” (Ladefoged, Elements, 8). Here is an animation of the air molecules in a sound wave propagation.

Combined, a pendulum and a sinusoid illustrate the properties of sound waves and they help explain the terminology related to the physics of speech. For example, the distance between points E and M1 (or E and M2) is the amplitude. It shows the maximum oscillation points of the particles or, in sound, “the extent of maximum variation in air pressure” (Ladefoged, Elements, 14). A pendulum’s period (or a cycle) is a trajectory from E to M1, M2 and back to E. The number of such periods in a second is frequency, and it is measured in hertz (Hz). A pendulum with one oscillation per second has 1 Hz (equation 1). A sound of 100 Hz has an identifiable part that repeats once in a tenth part of a second.

1 Hz = 1/s

The energy of a sound wave depends on the force that created it. The bigger the energy in making the sound wave, the bigger pressure level in the medium it creates. The energy of a sound wave is related to its amplitude: a very strong wave will have big amplitude, and vice versa. The sound pressure, or its intensity, is measured in dB (decibels).

The human ear is very sensitive to pressure variations, estimated at 1013 units of intensity (Crystal 36). For easier reference, the logarithmic scale is used. Thus, units of 1013 are scaled to 130 dB (36).

A simple sinusoid below is an abstraction of a simple periodic sine wave. For its description, three items are needed: amplitude, frequency and phase [1] (Johnson 7). From the picture we see that the frequency of the sound is 1 per unit of time, while the amplitude reaches its peaks at 2 and -2 on the vertical scale. Unlike simple periodic waves, complex periodic waves “are composed of at least two sine waves” (8). One such complex wave has a pressure oscillation (an amplitude) that is the result of the pressure oscillations of at least two waves (Ladefoged, Elements 37), and, of course, the phases of the waves involved. Every complex wave can be seen as composed of several simple waves, and the merit of such model is that “any complex waveform can be decomposed into a set of sine waves having particular frequencies, amplitudes and phase relations)” (Johnson 11). The process of “breaking complex wave down into its sinusoidal components” (Clark 203) is well-known in physics and is called the Fourier analysis, named after the scientist who “developed its mathematical basis” (203) in XIX century.

A sinusoid graph
A sinusoid with equilibrium, maximum and minimum points corresponding to the pendulum movements

The second group of waves is aperiodic waves. They are characterised by the lack of repetitive pattern. Two types of waves are grouped under the term aperiodic: white noise and transients. White noise contains a completely random waveform, while waveform in transients does not repeat; in speech, an example for white noise is a fricative such as [s] (Johnson 12). Aperiodic sounds can also be subjected to Fourier analysis.

Sometimes pressure fluctuations in form of sound that hit an object cause the object to vibrate. The vibrations occur if the acting frequency is within the “effective frequency range” or resonator bandwidth (Ladefoged, Elements 68). Such induction of vibrations by another vibrating object is called resonance. Every object has a specific range of frequencies that it can respond to, and those frequencies correspond to the dominant frequencies of the sound the object can create – or as Ladefoged explains it: “… [T]he resonance curve of a body has the same shape as its spectrum” (65). In speech, the speech organs have the function of resonators: they filter (enhance and dampen) properties of waves, recognised as the speech sounds.

[1]  Phase is “the timing of the waveform relative to same reference point” (Johnson 8).

You can get SVG versions of the images (click for the pendulum of for the sinusoid).

This post is based on a draft for one of the introductory chapters in my paper.
Previous text: The Speech Organs and Airstream

Formant synthesis application

Jonas Beskow at the Centre for Speech Technology KTH Stockholm wrote free Formant Synthesis Demo computer programme that runs on Windows and Linux (and on any other OS for which the application can be compiled from the open source code the author kindly uploaded).

The programme synthesises F1, F2, F3 and F4 formants from several sources (rectangle, triangle, sine, sampled and noise). It “demonstrates formant-based synthesis of vowels in real time, in the spirit of Gunnar Fant’s Orator Verbis Electris (OVE-1) synthesiser of 1953” (from the About window).

„Formants are defined by Fant  as ‘the spectral peaks of the sound spectrum |P(f)|’ of the voice. Formant is also used to mean an acoustic resonance,[2] and, in speech science and phonetics, a resonance of the human vocal tract. It is often measured as an amplitude peak in the frequency spectrum of the sound, using a spectrogram (in the figure) or a spectrum analyzer, though in vowels spoken with a high fundamental frequency, as in a female or child voice, the frequency of the resonance may lie between the widely-spread harmonics and hence no peak is visible. In acoustics, it refers to a peak in the sound envelope and/or to a resonance in sound sources, notably musical instruments, as well as that of sound chambers” — Wikipedia.

Formant Synthesis Demo
The window of the Formant Synthesis Demo

The download link is on the Formant Synthesis Demo site.