A diphthong is defined by Jones as “a sound made by gliding from one vowel to another … represented phonetically by sequence of two letters” (Pronunciation 22). A sound realised as a diphthong marks “a change from one vowel quality to another, and the limits of the change are roughly indicated by the two vowel symbols” (O’Connor, Phonetics 155). It is important to note that even though a diphthong is “… phonetically a vowel glide or a sequence of two vowel segments [it] … functions as a single phoneme” (220).
Vowels are speech sounds during whose production “the tongue is held at such a distance from the roof of the mouth that there is no perceptible frictional noise” and “a resonance chamber is formed which modifies the quality of tone” (Jones, Pronunciation 12). Gimson defines vowels as a “category of sounds … normally made with a voiced egressive air-stream, without any closure or narrowing such as would result in the noise component characteristic of many consonantal sounds” (Introduction 35). – – Which speech sounds are vowels?
The critical property of diphthongal realisation of a sound is when “the organs of speech perform a clearly perceptible movement” (Jones, Outline 63). Gimson notes that diphthongs, or “diphthongal vowel sounds” (Introduction 39) are sounds “which have a considerable voluntary glide”. They are “the sequences of vocalic elements … which form a glide within one movement” (126).
The movement in a diphthong starts from the first element, which is usually a pure vowel (127) and reaches an approximate value of a vowel indicated by the second element or “the point in the direction of which the glide is made” (126). The point of direction, whether on the cardinal vowel diagram, or the tongue in the mouth, enables classification of the RP diphthongs into two groups: closing and centring (Jones, Pronunciation 23-24):
The first element in RP diphthongs is usually [ɪ, e, a, ʊ, ə], while the second is [ɪ, ʊ, ə] (Gimson, Introduction 126). However, one of the characteristics of diphthongs is great regional variety (not discussed here).
Classification of diphthongs on the closing and the centring
Type Constituent vowels
Closing eɪ, ɔʊ, ɑɪ, ɑʊ, ɔɪ
Centring ɪə, ɛə, ɔə, ʊə
Diphthongs can also be divided into groups based on the vowel to which they gravitate in the second element. Thus, we have groups that have /ɪ/, /ʊ/ and /ə/ as the second element.
In this post we are focused on Received Pronunciation, and the examples about the sounds do not include different variants of pronunciation (whether in the UK itself, or the USA, AU or other). (Here are the RP vowels of English, placed on vowel diagram, based on the overview in O’Connor’s Phonetics.)
Diphthong /eɪ/ starts “from slightly below the half-close front position and moves in the direction of RP /ɪ/” (Gimson, Introduction 128). The beginning of this diphthong is between cardinals [e] and [ɛ]. The first element of the diphthong /aɪ/ “varies from central to front” (O’Connor 167) or, in Gimson’s description, it is “slightly behind the front open position i.e. C[ä]” (Introduction 129). The glide ends with RP /ɪ/ position.
Diphthongs /ɔɪ/ and /ɔɪ/
The first element of /ɔɪ/in RP is pronounced very close to cardinal [ɔ] and the second, after the configuration changes, is close towards the pronunciation of /ɪ/ (O’Connor, Phonetics 169). In this glide “the range of closing … is not as great as in /aɪ/ …” and “the jaw movement … may not … be as marked as in the case of /aɪ/” (Gimson, Introduction 131). This diphthong can be seen as asymmetrical on the RP system, since it is the “only glide of this type with a back starting point” (132).
The realisation of diphthong /əʊ/ starts with the articulators positioned for “typical RP [ɜ:] position”, while afterwards the tongue moves “slightly up and back to RP [ʊ], but the starting point may vary …” (O’Connor 167). In conservative pronunciation this diphthong starts “in a more retracted region”, near centralised (or centralised-open) [o], “and the whole glide is accompanied by increasing lip-rounding” (Gimson, Introduction 133). In an affected variant, the diphthong starts with more centralised-closed [ɜ] position (134). Also, “in many speakers of general RP, the 1st (central) element is so long that there may rise for a listener a confusion between /əʊ/ an /ɜ:/, especially when [ɫ] follows, e.g. goal, girl … ” (134).
The diphthong /ɑʊ/ starts “further back than /aɪ/ and changes towards RP /ʊ/” (O’Connor, Phonetics 168); Gimson describes it as starting “slightly more fronted … than RP /ɑ:/” (Introduction 136). Another dominant diphthong in the back region is /əʊ/, so /ɑʊ/ has to be pronounced with a perceivable difference – for this reason no raising is possible without losing the contrast, and so “fronting or retraction” (136) prevails in the variants of /ɑʊ/.
This is one of the centring diphthongs (/ɪə/, /ɛə/ and /ʊə/). Diphthong /ɪə/, starts with the tongue positioned for /ɪ/. In the second part of the pronunciation, the movement has two types. The first is “the more open variety of /ə/ when /ɪə/ is final in the words”, while in the second type, in non-final positions, the movement is not so extensive (Gimson, Introduction 142). The two pronunciations are, in essence, “two main allophones of /ɪə/ in RP, corresponding to those of /ə/” (O’Connor, Phonetics 170).
Diphthong /ɛə/ “starts at cardinal /ɛ/ or below and moves to more central but equally open position” (171). Gimson adds that when final /ə/ acquires a more open position, while in the cases when /ɛə/ is “closed by a consonant”, /ə/ it is of “mid … type” (Introduction 143). The variants are mostly in the degree of openness of the first element (143).
The glide /ʊə/ has “coalesced with /ɔ:/ for most RP speakers” (Gimson, Introduction 145) and “[a] monophthongal pronunciation is … found regularly before /r/ in, e.g. alluring, furious, having the quality of the diphthong’s beginning point” (O’Connor, Phonetics 172). Gimson also gives an overview of the monophthongal pronunciation, such as in the words your, Shaw or sure, but warns “that such lowering of monophthongization of /ʊə/ is rarer in case of less commonly used monosyllabic words such as moor, tour, dour” (Introduction 145). The diphthong is pronounced with the first element around /ʊ/, while the second element reaches a “more open type of /ə/” (144).
Notes about Length and Targets
For the exception of the falling diphthongs, “most of the height and stress associated [with the sound] is concentrated on the 1st element, the 2nd element being only lightly sounded” (126). The length of the diphthongs is the same as in long pure vowels, which means they are affected by the same syllabic fortis and lenis rules.
Harrington describes a study based on the hypotheses by Pols, about classification of diphthongs applied in American English by Cottinfield, and the importance of the targets for the classification. The first hypothesis is about “dual target” (or onset plus offset), the second about “onset plus slope”, while the third involves “onset plus direction”. According to the first hypothesis, “both diphthong targets are critical for identification [of a diphthong]”, while the second claims that “quality is presumed to depend on the first target”; the third hypothesis postulates that “the first target and the direction of spectral movement” are the biggest contributors in diphthong recognition (Techniques 66).
 The figures in the text were derived from O’Connor’s Phonetics.
Need a vowel chart with English monophthongs and diphthongs in SVG format? It’s here.
Vowels are speech sounds pronounced so there are no “obstacles” to airstream (unlike the way consonants are pronounced, for example). This post lists English vowels (21 in this case, although some sources list 22), both monophthongs and diphthongs. They are grouped into the long and short ones. There is also a vowel diagram showing vowels at their approximate positions.
The English vowels with examples (O’Connor, first edition 1973)
IPA (O'Connor) Examples
1 i: see, unique, feel
2 ɪ wit, mystic, little
3 e set, meant, bet
4 æ pat, cash, bad
5 ɑ: half, part, father
6 ɒ not, what, cost
7 ɔ: port, caught, all
8 ʊ wood, could, put
9 u: you, music, rude
10 ʌ bus, come, but
11 ɜ: beard, word, fur
12 ə alone, butter
13 eɪ lady, make
14 əʊ go, home
15 aɪ my, time
16 ɑʊ now, round
17 ɔɪ boy, noise
18 ɪə here, beard
19 ɛə fair, scarce
20 ɔə more, board
21 ʊə pure, your
Gimson (Introduction 90) sorts English vowels into three groups: short, long “relatively pure” and long “diphthongal glides, with prominent 1st element”.
Short and long monophthongs in English
short ɪ e æ ɒ ʊ ʌ ə
long i: u: ɑ: ɔ: ɜ:
Vowel diagram is used to provide details about the sounds involved. The phoneme /i:/ often has the quality of a diphthong (O’Connor 154), which depends on the accent. The arrow on the diagram marks the approximate final location of the sound in diphthongal realisation. The phoneme /ɪ/ is short and monophthongal. The phoneme /e/ is “in RP … generally realised … as a short, front vowel between cardinals [e] and [ɛ]” (O’Connor 156), while /æ/ is also a short vowel, but between cardinal [ɛ] and [a], it is usually realised as a monophthong.
The phoneme /ʌ/ is a “short almost open central vowel”, while /ɑ:/ is an “open, rather back vowel” (O’Connor 157-8). The phoneme /ɒ/ is pronounced by speakers of RP as “a short, back, open or almost open vowel” (158). In a word such as caught there is the phoneme /ɔ:/. In the diagram /ɔ:/ it is just below the cardinal vowel [o]. The dashed line pointing towards the more central position illustrates the fact that many speakers do not make a distinction between a monophthong /ɔ:/ and a diphthong /ɔə/. In such cases, the speakers “nevertheless use a diphthong [ɔə] … before pause” (160). The consequence is that “both saw and sore are pronounced [sɔə] and both caught and court are pronounced [kɔ:t]” (160).
The phoneme /ʊ/ is somewhat more centralised than cardinal [o], and it shows a relatively constant pronunciation in dialects (162), unlike most of other vowels. About /u:/ O’Connor notes that it “most often has a diphthongal realisation … but it may be given a monophthongal pronunciation slightly lower and more central than cardinal [u]” (162). The diphthongal property of the vowel is indicted by an arrow in the graph. The phoneme /ɜ:/ is “typically a long, mid, central vowel”, but in rhotic accents (American English, for example) this vowel is in the sequence /ər/ (163) replaced by the retroflex [ɹ], i.e. bird (163). The phoneme /ə/ has “two major allophones in RP, one central and half-close which occurs in non-final positions…, and one central and about half open which occurs before pause …” (the example for the first variant is about, and for the second sailor) (164).
Download the SVG English monophthongs and diphthongs graphs used in this post here.