Skip to content

The Speech Organs and Airstream

2011 November 11
by Mlinar

Speech is produced by the speech organs, where airstream  causes the vocal folds to vibrate (this applies to the egressive airstream mechanism). The created sound then moves through the articulatory system, attaining its final form – one of the sounds used in the language of the speaker.  This text is an overview of what happens with  air on its way out of the vocal tract.

The air from the lungs enters the larynx, a structure that consists of several cartilages: the thyroid, cricoid and arytenoid (Ogden, Introduction 40). The larynx is about 11 cm long and has 2.5 cm in diameter (Clark 30). The angle that is formed by the sides of the thyroid cartilage is 90° in males and 120° in females (30). This physical difference influences the voice quality intrinsically  (but, the quality can be culturally influenced as well [1]).

a graph showing the most relevant elements of the vocal tract

The vocal tract (Ogden 10)

The epiglottis, a leaf-shaped cartilage that closes the airways during swallowing, thus protecting sensitive tissue, is located above the larynx. The larynx houses vocal folds, “typically about 17 to 22 mm long in males and about 11 to 16 mm long in females” (32). The cartilage structure that surrounds the vocal folds and the vocal folds themselves form the glottis, a “laryngeal valve aperture” (32).

Above the epiglottis is the pharynx, a muscular passage that connects the oral cavity, the larynx and the velum. The pharynx is passively involved in speech (42), because it modifies the size of the space between the oral cavity and the larynx. The velum, a soft tissue, is placed above the pharynx. It directs the airflow in speech: if raised it closes the velopharyngeal port, an opening to the nasal cavity [2]  (46).

The oral cavity is a space in vocal tracts where humans can exert the greatest control of its size and shape (O’Connor, Phonetics 34), which makes it critical for “determining the phonetic qualities of speech sounds” (Clark, Introduction, 47). The oral cavity is a space between the lips (anteriorly [3]), the palotaglossus muscle (posteriorly), the tongue (inferiorly) and the roof of the mouth (superiorly) (47). The lips, the tongue and the angle of the mandible have an important role in speech sound production, although not of equal importance (for example, it is possible to make a distinctive sound with the mandible fixed) (47). Considering the complex muscular and neural structure of the mobile parts that surround the oral cavity it is no surprise, then, “that the characteristics of vowels depend on the shape of the open passage above the larynx” (Jones, Outline 29). Of course, this refers not only to vowels, but to all speech sounds; what makes vowels interesting, however, is the lack of any closure in the passages, so their quality is conditioned by the shape of the passages, or “inherent properties of the cavities” (Crystal 27).

When the tongue is moved backwards or forwards, the space in the pharyngeal region changes, and with the movement upwards and downwards (usually followed by mandible movement) the space defined by the hard palate and tongue changes in volume and shape (Stevens 22). According to Johnson the volume [4] of the vocal tract in males is about 170 cm3  and 130 cm3 in females; when the mandible is lowered for about 1 cm (average in speech), the volume increases to 190 cm3 and 150 cm3, respectively (24). Citing Goldstine, Johnson gives 41.1 cm as an average vocal tract length in adult females, 6.3 cm for pharynx length and 7.8 cm for the oral cavity length. In males, the values are 16.9 cm, 8.9 cm and 8.1 cm, respectively (25). This shows that the oral cavity in both sexes is almost of the same length, while differences are reflected in the length of the pharyngeal region (25).

The physiology of the vocal tract  links anatomy with phonetics. It describes, in terms of mechanics, properties and dimensions of the environment where speech sounds are created.


[1] “There are cultural effects too: in English-speaking cultures, it is common for males to enhance their intrinsically lower f0 by lowering their larynx, and for females to enhance their intrinsically higher f0.” (Ogden, Introduction 46)

[2] The velopharyngeal port is very important in discussing nasal sounds, where the air stream has a complex path that includes several cavities and an intricate physical model.

[3] Anterior/posterior – in anatomy, the axis from head to the opposite end of body.

[4] The values refer to the measurements when the vocal tract is in the neutral configuration.

This post is based on a draft for one of the introductory chapters in my paper.
Previous text: Speech and the Respiratory System
Next text: Sound (Related to Speech) 

Speech and the Respiratory System

2011 November 7
by Mlinar

Speech, a form of human communication,  is produced by three groups of organs working together: respiratory, phonatory and articulatory. The dominant elements of the respiratory system [1] are the lungs, the chest wall and the diaphragm. Working together, they provide the mechanical energy in form of air pressure, the aerodynamic energy of the speech (Kent & Read 2) needed to produce sound in the larynx. The tongue, the lips, the jaw and the velum, the articulatory elements of the speech organs, modify the properties of created sounds. The extent of modification depends on several factors, including the position of articulatory organs, the intensity of sound (pressure), physical properties of the tissues, etc. The larynx is the place of phonation.

The respiratory system [2] is located in the chest (thorax) – a cavity, created by rib cage and the muscles. The ribs are posteriorly connected to the vertebral column, and anteriorly to the sternum (breast-bone). This thoracic cavity is on its top limited by the shoulder blades (scapuae), and on the bottom by the diaphragm. The lungs are located within the thoracic cavity: they are a cone-shaped organ, made of sponge-like matter, consisting of many bronchioles that branch into numerous alveoli. The lungs and the inward surface of the cage are connected with pleural linkage, a fluid-like matter that makes possible for the lungs to expand or shrink simultaneously with the cavity. The lungs act as bellows (Crystal 20): after the chest muscles flex, the pressure inside the lungs increases, which forces air to exit; in reverse, by lowering the diaphragm or flexing the rib muscles, the pressure inside the lungs decreases, which forces the air to enter the respiratory system.

a graph of the respiratory system

The Respiratory System (Wikipedia)

There are two important phases in the respiratory system that are related to speech: inspiration and exhalation. They make the respiratory cycle, which is relevant not only in providing the energy, but also “in the sequential organization of speech” (Clark, Introduction 21). Inspiration, or the process of inhaling, occurs when the thoracic volume increases, which causes the lowering of the pressure in lungs. This pressure difference causes air to enter the system. The increase of space within the thorax is achieved by the rib cage moving upwards (caused by shortening of intercostal muscles) or by lowering of the diaphragm. Expiration, or exhalation, is achieved by the “elastic recoil forces” or relaxation pressure (24).

an image from mri visualisation

A still image from real-time MRI taken during speech (Wikipedia)

However, in situations where more energy is needed (shouting, prolonged speaking), the muscles activate to help the air stream mechanism and increase the flow of air. The subcostal muscles and the traverse thoracic muscles shrink the rib cage, while abdominal muscles (the traverse, internal oblique, external oblique and the rectus abdominis) “compress the abdomen” (25). In an experiment [3] described by Clark (27), Ladefoged, among others, showed that significant energy must be used to maintain the air energy once the exhalation phase reaches zero capacity. This is why “speech does not exploit this part of the expiratory phase except under extreme conditions” (28).

This post is based on a draft for one of the introductory chapters in my paper.
Next text: The Speech Organs and Airstream 


[1] Another system directly involved in speech is the nervous system.

[2]The discussion in was adopted from Clark’s chapter about the speech organs.

[3] Although, the experiment was done on one subject only – warned Ladefoged.

The Euclidean Distance in Diphthongs – R Graph and Code

2011 November 4

Representing and plotting a distance in F1/F2 graph, in terms of the Euclidean distance, is relatively easy in R. This post shows one of the ways of achieving that. First, we provide a sample data, which consists of F1 and F2 values for two diphthong targets.  Then, draw the diphthong positions with their starting and ending targets, and, finally, calculate the distance. This R code does most of the F1/F2 calculations and drawing.

Data sample (Formants in a Diphthong)

First, a data sample.

ESLStudents
    ascii   ipa     f1      f2
1  aw_l_1 ɑʊl_1 900.96 1600.10
2  aw_l_2 ɑʊl_2 373.61 1082.59 

RPSpeaker
    ascii   ipa     f1      f2
1  aw_l_1 ɑʊl_1 823.07 1542.39
2  aw_l_2 ɑʊl_2 411.39 1405.78

These are the values for the first two formants in /ɑʊ/, as measured in a group of 15 female ESL student and one RP speaker (also female). Number 1 in the notation marks the first vowel target, 2 the second (thus, aw_l_1 is /ɑ/ and aw_l_2 is /ʊ/), while the “l” marks a long diphthong. The two targets will be the starting and the ending of a line, and the line’s length is expressed by the Euclidean distance.

The Euclidean Distance

Euclidean distance is a metric distance from point A to point B in a Cartesian system, and it is derived from the Pythagorean Theorem. Thus, if a point p has the coordinates (p1, p2) and the point q = (q1, q2), the distance between them is calculated using this formula:

distance <- sqrt((x1-x2)^2+(y1-y2)^2)

Our Cartesian coordinate system is defined by F2 and F1 axes (where F1 is y-axis), and the metric distance refers to the distance from one diphthong target to another. The vowel targets, corresponding to A and B points are defined by the F1/F2 values in Hertz for a particular vowel. In our example above, A and B  are rows 1 and 2, while the values are F2 and F1 frequencies.

Plotting in R

The third step in the process is plotting, so we could see the graphical representation of the distance. We can do that by:

  1. Drawing the F1/F2 “coordinate system”.
  2. Drawing the vowels in A and B positions, and connecting them with a line.
  3. Drawing the arrows showing the direction of pronunciation and placing the IPA symbols.
An example looks like this:
Diphthongs drawn on F1/F2 plot

The English diphthongs as pronounced by the ESL students and a native RP speaker.

The diphthong  /ɑʊ/ is plotted in the lower right corner of the graph. Here are the Euclidean distances for that diphthong (in both variants):
          RPSpeaker  ESLStudents
aw_l ɑʊl  738.86     433.75
aw_s ɑʊs  816.08     471.60
The R code used to plot the graph can be found here.