A diphthong is defined by Jones as “a sound made by gliding from one vowel to another … represented phonetically by sequence of two letters” (Pronunciation 22). A sound realised as a diphthong marks “a change from one vowel quality to another, and the limits of the change are roughly indicated by the two vowel symbols” (O’Connor, Phonetics 155). It is important to note that even though a diphthong is “… phonetically a vowel glide or a sequence of two vowel segments [it] … functions as a single phoneme” (220).
Vowels are speech sounds during whose production “the tongue is held at such a distance from the roof of the mouth that there is no perceptible frictional noise” and “a resonance chamber is formed which modifies the quality of tone” (Jones, Pronunciation 12). Gimson defines vowels as a “category of sounds … normally made with a voiced egressive air-stream, without any closure or narrowing such as would result in the noise component characteristic of many consonantal sounds” (Introduction 35). – – Which speech sounds are vowels?
The critical property of diphthongal realisation of a sound is when “the organs of speech perform a clearly perceptible movement” (Jones, Outline 63). Gimson notes that diphthongs, or “diphthongal vowel sounds” (Introduction 39) are sounds “which have a considerable voluntary glide”. They are “the sequences of vocalic elements … which form a glide within one movement” (126).
The movement in a diphthong starts from the first element, which is usually a pure vowel (127) and reaches an approximate value of a vowel indicated by the second element or “the point in the direction of which the glide is made” (126). The point of direction, whether on the cardinal vowel diagram, or the tongue in the mouth, enables classification of the RP diphthongs into two groups: closing and centring (Jones, Pronunciation 23-24):
The first element in RP diphthongs is usually [ɪ, e, a, ʊ, ə], while the second is [ɪ, ʊ, ə] (Gimson, Introduction 126). However, one of the characteristics of diphthongs is great regional variety (not discussed here).
Classification of diphthongs on the closing and the centring Type Constituent vowels Closing eɪ, ɔʊ, ɑɪ, ɑʊ, ɔɪ Centring ɪə, ɛə, ɔə, ʊə
Diphthongs can also be divided into groups based on the vowel to which they gravitate in the second element. Thus, we have groups that have /ɪ/, /ʊ/ and /ə/ as the second element.
Long vowels / diphthongs: [ɪ] eɪ, aɪ, ɔɪ, ʊɪ [ʊ] əʊ, ɑʊ [ə] ɪə, ɛə, ɔə, ʊə
In this post we are focused on Received Pronunciation, and the examples about the sounds do not include different variants of pronunciation (whether in the UK itself, or the USA, AU or other). (Here are the RP vowels of English, placed on vowel diagram, based on the overview in O’Connor’s Phonetics.)
Diphthong /eɪ/ starts “from slightly below the half-close front position and moves in the direction of RP /ɪ/” (Gimson, Introduction 128). The beginning of this diphthong is between cardinals [e] and [ɛ]. The first element of the diphthong /aɪ/ “varies from central to front” (O’Connor 167) or, in Gimson’s description, it is “slightly behind the front open position i.e. C[ä]” (Introduction 129). The glide ends with RP /ɪ/ position.
Diphthongs /ɔɪ/ and /ɔɪ/
The first element of /ɔɪ/ in RP is pronounced very close to cardinal [ɔ] and the second, after the configuration changes, is close towards the pronunciation of /ɪ/ (O’Connor, Phonetics 169). In this glide “the range of closing … is not as great as in /aɪ/ …” and “the jaw movement … may not … be as marked as in the case of /aɪ/” (Gimson, Introduction 131). This diphthong can be seen as asymmetrical on the RP system, since it is the “only glide of this type with a back starting point” (132).
The realisation of diphthong /əʊ/ starts with the articulators positioned for “typical RP [ɜ:] position”, while afterwards the tongue moves “slightly up and back to RP [ʊ], but the starting point may vary …” (O’Connor 167). In conservative pronunciation this diphthong starts “in a more retracted region”, near centralised (or centralised-open) [o], “and the whole glide is accompanied by increasing lip-rounding” (Gimson, Introduction 133). In an affected variant, the diphthong starts with more centralised-closed [ɜ] position (134). Also, “in many speakers of general RP, the 1st (central) element is so long that there may rise for a listener a confusion between /əʊ/ an /ɜ:/, especially when [ɫ] follows, e.g. goal, girl … ” (134).
The diphthong /ɑʊ/ starts “further back than /aɪ/ and changes towards RP /ʊ/” (O’Connor, Phonetics 168); Gimson describes it as starting “slightly more fronted … than RP /ɑ:/” (Introduction 136). Another dominant diphthong in the back region is /əʊ/, so /ɑʊ/ has to be pronounced with a perceivable difference – for this reason no raising is possible without losing the contrast, and so “fronting or retraction” (136) prevails in the variants of /ɑʊ/.
This is one of the centring diphthongs (/ɪə/, /ɛə/ and /ʊə/). Diphthong /ɪə/, starts with the tongue positioned for /ɪ/. In the second part of the pronunciation, the movement has two types. The first is “the more open variety of /ə/ when /ɪə/ is final in the words”, while in the second type, in non-final positions, the movement is not so extensive (Gimson, Introduction 142). The two pronunciations are, in essence, “two main allophones of /ɪə/ in RP, corresponding to those of /ə/” (O’Connor, Phonetics 170).
Diphthong /ɛə/ “starts at cardinal /ɛ/ or below and moves to more central but equally open position” (171). Gimson adds that when final /ə/ acquires a more open position, while in the cases when /ɛə/ is “closed by a consonant”, /ə/ it is of “mid … type” (Introduction 143). The variants are mostly in the degree of openness of the first element (143).
The glide /ʊə/ has “coalesced with /ɔ:/ for most RP speakers” (Gimson, Introduction 145) and “[a] monophthongal pronunciation is … found regularly before /r/ in, e.g. alluring, furious, having the quality of the diphthong’s beginning point” (O’Connor, Phonetics 172). Gimson also gives an overview of the monophthongal pronunciation, such as in the words your, Shaw or sure, but warns “that such lowering of monophthongization of /ʊə/ is rarer in case of less commonly used monosyllabic words such as moor, tour, dour” (Introduction 145). The diphthong is pronounced with the first element around /ʊ/, while the second element reaches a “more open type of /ə/” (144).
Notes about Length and Targets
For the exception of the falling diphthongs, “most of the height and stress associated [with the sound] is concentrated on the 1st element, the 2nd element being only lightly sounded” (126). The length of the diphthongs is the same as in long pure vowels, which means they are affected by the same syllabic fortis and lenis rules.
Harrington describes a study based on the hypotheses by Pols, about classification of diphthongs applied in American English by Cottinfield, and the importance of the targets for the classification. The first hypothesis is about “dual target” (or onset plus offset), the second about “onset plus slope”, while the third involves “onset plus direction”. According to the first hypothesis, “both diphthong targets are critical for identification [of a diphthong]”, while the second claims that “quality is presumed to depend on the first target”; the third hypothesis postulates that “the first target and the direction of spectral movement” are the biggest contributors in diphthong recognition (Techniques 66).
 The figures in the text were derived from O’Connor’s Phonetics.
Need a vowel chart with English monophthongs and diphthongs in SVG format? It’s here.