Vowels, Vowel Formants and Vowel Modification

(Page 2 of 3)

Vowel Formants

Each phoneme is distinguished by its own unique pattern in the spectrogram. Spectrograms display the acoustic energy – formants, which appear as dark bands - at each frequency, and how this changes with time. For voiced phonemes, the signature involves large concentrations of energy (formants). During voicing, which is a relatively long phase, the spectral or frequency characteristics of a formant evolves as phonemes unfold and succeed one another. Formants that are relatively unchanging over time are found in the monophthong vowels and the nasals; formants which are more variable over time are found in the diphthong vowels and the approximants, but in all cases the rate of change is relatively slow.

Within each formant, and typically across all active formants, there is a characteristic waxing and waning of energy in all frequencies. This cyclic pattern is caused by the repetitive opening and closing of the vocal folds, which occurs at an average of 125 times per second in the average adult male, and approximately twice as fast (250 Hz) in the adult female, giving rise to the sensation of pitch.

The vocal tract acts as a resonant cavity, and the positions of the jaw, lips, and tongue – as in the articulation of specific vowel sounds - affect the parameters of the resonant cavity, resulting in different formant values.

The information that humans require to distinguish between vowels can be represented purely quantitatively by the frequency content of the vowel sounds; that is, the different vowel qualities are realized in acoustic analyses of vowels by the relative values of the formants - the acoustic resonances of the vocal tract. Each vowel has its own distribution of acoustic energy that distinguishes it from all other vowels. Vowels will almost always have four or more distinguishable formants. However, the first two formants are the most important in determining vowel quality and in differentiating it from other vowels.

Each vowel, therefore, has its own ‘fingerprint’, which is defined or characterized by its unique frequencies at the first and second formants. These formants are usually referred to as the ‘vowel formants’. They are not adjustable for each given vowel or variant of each vowel. For example, the formant frequencies for the [i] vowel for any given voice are more or less constant and remain within very specific limits in the frequency range. For this reason, these vowel formants may also be called ‘fixed formants’. If these vowel formants are not produced by the vocal tract, the particular vowel can’t exist. Conversely, whenever the formants for a particular vowel are present, that vowel is heard.

The chart below shows the individual values (frequencies) for the first and second formants for each of the five pure Italian vowels. While it isn’t necessary for a student of voice to memorize these specific frequency ranges (formant values), knowing which vowels have higher or lower formants – both the first and the second – will aid in the process of formant tuning.

Vowel Formant Centres

IPA Vowel Symbol Main Formant Region Formant 1 (f1) Formant 2 (f2)
u 200–400 Hz 320 Hz 800 Hz
o 400–600 Hz 500 Hz 1000 Hz
a 800–1200 Hz 1000 Hz 1400 Hz
e 400–600 and 2200–2600 Hz 500 Hz 2300 Hz
i 200–400 and 3000–3500 Hz 320 Hz 2500 Hz

These formants can be experienced by gently thumping the side of the larynx with a finger while mouthing the five pure Italian vowels [e, i, a, o, u]. The [i] and [u] vowels have the lowest resonance of the vocal tract, whereas the [a] vowel has the highest resonance, while the [e] lies somewhere in between. These are the first formant, (abbreviated as f1), resonances for these vowels for your particular voice. If you modify these vowels, you will hear other f1 resonances for the modified vowel.

Each vowel has its own laryngeal configuration, and there is a corresponding vocal tract configuration that permits a specific vowel to take on its distinctive form. It is impossible to get a one to one relation between given articulatory features and the formants' values, as a lot of variables play a role in this phenomenon. Also by removing the speaker specific parameters, every formant is determined by the joint effect of different articulatory characteristics, (which includes vowel height, backness and roundedness).

However, a positive relation can be assessed, respectively, between the degree of closing (vs. opening) of the oral cavity and the f1 values, and between the degree of back displacement (vs. front displacement) of the tongue and the f2 values. For example, [i] has a relatively low f1 value, because the oral cavity is rather closed, and a high f2 value, because the tongue is displaced to the front, while for [a], it is the other way round. In particular, in back vowels such as [a], the f1 and f2 values are closer to each other, while in front vowels they are more distant. Vowels traditionally known as ‘front’ have f1 and f2 a good distance apart. Vowels traditionally known as ‘back’ have f1 and f2 so close that they touch.

The first formant increases in frequency as the vowels go from close to open. As can be seen in the chart above, the first formant (f1) has a higher frequency for an open vowel, such as [a], and a lower frequency for a close vowel, such as [i] or [u].

The second formant increases in frequency as the vowels go from back to front. The second formant (f2) has a higher frequency for a front vowel, such as [i], and a lower frequency for a back vowel, such as [u].

The third formant (f3) is involved in the differentiation between rounded and unrounded vowels (e.g., the difference between [i] and [y]).

A vowel is called compact when f1 and f2 are close together, such as [a], and diffuse when they are far apart.

There are certain ‘rules’ that help acousticians predict the formant structure of vowels. First, the area of the major constriction determines the location of f1; as the area decreases, f1 also decreases. Second, the distance from the glottis to the major constriction determines the location of f2; as distances increases, f2 also increases. Third, for a given area and distance from the glottis of the major constriction in a vowel, lip rounding causes f1 and f2, especially f2, to fall. Thus, the back vowel [u] has a lower f2 than the second rule alone would predict because the lips are rounded when pronouncing this vowel.

The front, closed vowel [i], for instance, has acoustic strength in the upper part of the spectrum near the region of the Singer’s Formant, whereas the more neutral, open vowel [a] has its acoustic strength at the bottom half of the spectrum. The back vowels [o] and [u] are defined at increasingly low acoustic levels.

The first formant corresponds to the vowel openness (vowel height). Open vowels have high f1 frequencies while close vowels have low f1 frequencies. [i] and [u] have similar low first formants, whereas [ɑ] has a higher formant. [ɑ] is a low vowel, so its f1 value is higher than that of [i] and [u], which are high vowels. [i] is a front vowel, so its f2 is substantially higher than that of [u] and [ɑ], which are low vowels.

The second formant, f2, corresponds to vowel frontness. Back vowels have low f2 frequencies while front vowels have high f2 frequencies. The front vowel [i] has a much higher f2 frequency than the other two vowels. However, in open vowels the high f1 frequency forces a rise in the f2 frequency, as well, so an alternative measure of frontness is the difference between the first and second formants. For this reason, some people prefer to plot as f1 versus f2f1. (This dimension is usually called 'backness' rather than 'frontness', but the term 'backness' can be counterintuitive when discussing formants.)

During voicing, which is a relatively long phase, the spectral or frequency characteristics of a formant evolves as phonemes unfold and succeed one another. Formants that are relatively unchanging over time are found in the monophthong vowels and the nasals; formants which are more variable over time are found in the diphthong vowels and the approximants, but in all cases the rate of change is relatively slow.

Practical Guidelines For Singing Vowels

Most vocal training is done using the ‘five pure Italian vowels’. It is assumed by teachers that using these ‘landmark’ vowels in training will enable their students to sing any vowel, because other vowels are merely modified forms or variations of them, requiring very slight adjustments of the vocal tract in order to articulate them. Keeping vocal training to just five vowels also makes sense from a practical standpoint, as time during lessons is limited, and there is simply no way to train intensely on all vowels.

It is estimated that we actually have between twelve and fifteen vowel sounds commonly used in the English language. These additional vowel sounds can be found in the IPA Vowels and Symbols chart, along with some words that include these sounds.

Vowels are properly formed at the front of the mouth. The back of the tongue must remain stable - neither raised nor depressed into the larynx. The middle of the tongue forms the individual vowels, and the tip of the tongue should remain in its resting place behind the lower front teeth, except in the formation of certain consonants.

As explained in Singing With An ‘Open Throat’: Vocal Tract Shaping, it is important for a singer to initiate phonation (sound) with an open pharynx, high soft palate and relaxed, low larynx. Many students of voice, however, begin singing their vowels with closed vocal tracts, with the back of the tongue pressed up against a lowered soft palate, and the vocal folds too tightly closed. In speech, this kind of pressed phonation is very common, although we don’t notice it as much because we don’t sustain our vowels for as long as we do during singing, and because the consonants that usually begin syllables buffer this effect.

Say the vowel [a], for example, and pay close attention to how the back of the tongue and soft palate touch, closing off the acoustical space, before the vowel is spoken. When phonation begins, these articulators must separate from each other. The vocal folds themselves are also firmly closed, which then requires more air pressure to blow them apart and set them vibrating. You will notice that the same thing likely happens with all spoken vowels (when spoken either in isolation or at the beginning of a word).

During many vocal exercises, the vowel is not preceded by a consonant, so this tendency toward pressed phonation becomes more pronounced during training than during the singing of text. Many students’ voices sound ‘pressed’ or squeezed at the onset of the vowel because they tend to maintain the same habits that they have during speech. For many singers, this overly firm onset gives a sense of (or provides the illusion of) clearer definition to the vowels that they are singing, but it should not be mistaken for correct, efficient and healthy onsets of sound. In fact, pressed phonation can lead to vocal injury, and it also strips away some of the pleasantness of the tone due to its negative acoustical effects.

It takes some retraining for singers to learn to open up the acoustical space first, and then begin phonating (producing sound) without this kind of ‘closed throatedness’. I often tell my students to picture the acoustical ‘tube’ (vocal tract) being open right from the start, before sound is made – this correctly takes place when the singer is preparing to sing (i.e., at the time of inhalation) - and then allow the air and sound to pass through the already open spaces of the throat and mouth.

To learn to establish the openness in the throat before singing the vowel, a singer can use the neutral vowel ‘UH’, as in the word ‘good’, in the larynx before bringing focus into the tone. For example, start with the ‘uh’ in the larynx and then bring the tongue forward as in the [i] vowel. The open pharynx is established first, so that the brilliance of the tone can follow while keeping the open feeling in the throat.

Because of this tendency to begin phonation of vowels with a closed throat, it is essential for students to spend a great deal of time vocalizing solely on vowels. If a teacher has a student always insert consonants before all vowels during exercises, it is quite possible that this pressed phonation may slip under the radar. Once a singer learns to consistently maintain an open vocal posture during the singing of vowels, and especially at the start of them, consonants can then be added to the exercises. The teacher and student can then be reassured that the mode of phonation is healthy and that tonal balance will be achievable.

Vowels are the most important element of singing because they produce the greatest acoustical energy. All sustained notes, therefore, should be vocalized on the vowel sounds, not on consonants, which have higher levels of impedance, and therefore do not resonate as effectively. Singers must be careful not to sing the consonants at the end of syllables prematurely, which will add a constriction to the resonating space and thus limit resonance and volume. The vowel sounds must be held for a long as possible, with the consonants being added only at the last possible second.

While training on vowels is very important to good vocal training, including the singing of text (words, sentences, songs, etc.) to training methods enables a singer to practice singing different variations of vowels in different combinations with consonants. The singing of text represents an important ‘application’ aspect of vocal training. Balanced training will have the singer vocalizing on all of the applicable vowel sounds heard in the English language.

Having my students vocalize mainly on the five pure Italian vowels has revealed some unique tendencies and problems associated with the production of each of these vowels.

The most common tendency for vocalists (in most English dialects) when singing the vowel sound [e], as in ‘late’, is to sing the vowel sound as a diphthong. It usually takes a little practice to find the more pure version of this vowel sound, allowing the initial vowel sound to be held as the vowel core.

A lot of students of voice struggle with the vowel sound [i], as in ‘meet’, because it is a lateral or fronted vowel, making it very ‘closed’. Many students have difficulties maintaining openness in the vocal tract, and this vowel will often sound tight or ‘squeezed’, especially in the upper middle and the head registers. In the middle register, many singers tend to flatten the tongue in order to darken this vowel, which creates a muffled, distorted sound to the vowel. In head voice, many students have difficulties with thinness or shrillness (too much brightness) in the tone when singing such close vowels. Other singers, however, actually find this vowel easier to sing than more open vowels because they find it easier to keep breathiness at bay, and because its relatively low f1 value and high f2 value often create a louder, more resonant sound inside the singer’s internal hearing.

The vowel sound [a], as in ‘father’, can be particularly problematic for some students, as the sound has a tendency to ‘fall back’ into the throat. In most phonetic systems, the vowel [a] is considered to be the primary, or first, of the back vowel series. (In some others, it is the first of the neutral vowel series). During normal speech and in unskillful singing, back-vowel production typically brings a loss of upper partials (upper harmonic overtones). This loss of upper partials becomes particularly apparent when this vowel follows a front vowel, such as [i] or [e] because a downward acoustic curve occurs in which the perception of the pitch centre of the vowel lowers. A spectrum will display a loss of upper harmonic partials. The vowel-defining second formant, which together with the first formant largely identifies the vowel, demonstrates a downward stepwise progression. The vowel, then, sounds too dark. When the singer learns to retain the third, fourth and fifth formants, regardless of the progressing from lateral to rounded vowels, no vowel will ‘fall back’.

Also, being an open vowel, many singers tend toward breathy phonation while singing the [a] sound. Many students find this vowel to be more comfortable than more close vowels to sing, and may relax their breath support and vocal fold approximation as a result.

The vowel sound [o], as in ‘note’, has two different ‘phases’ in its production. In speech, the mouth is more rounded and more open at the beginning of the vowel than it is at the end of the vowel. At the end of the vowel, the lips become pursed, which creates an [u] sound, making the vowel a sort of diphthong. A singer must be very careful not to allow the lips to come forward at any point during the singing of the vowel [o]. I always encourage my students to maintain the core of the vowel; that is, to sing only the beginning of the vowel, with the lips nicely rounded and the mouth in a circle. Otherwise, the vowel begins to sound like another vowel, [u]. In contemporary genres, it is more acceptable to allow the lips to move forward to complete the vowel as it would be completed during speech to allow for better diction and word recognition, but only at the very last possible second.

A common difficulty with the vowel sound [u], as in ‘shoe’, is that the tone becomes overly dark and ‘hooty’ sounding. Again, as a singer proceeds from front to back vowels, the spectra undergo significant changes, which brings about a reduction or loss of upper partials. [u] is the final member of the back-vowel series. In speech, it tends to sound darker or duller than do front vowels, and lower in pitch than even neighbouring back vowels. (I'll explain in the section on vowel modification that all vowels have their own pitch.)

Any tendency to pucker the mouth into a small, rounded opening for the production of [u] removes overtone activity. The upper lip should not be pulled downward, covering the upper teeth, because the zygomatic muscles would then be lowered toward the grimacing posture. This gesture alters both internal and external shapes of the resonators.

Also, avoid the tendency to allow the lower jaw to slide forward during the production of the [u] vowel sound.

Nasal Vowels and Nasalization

Nasal vowels are vowels that are produced with a lowering of the velum, so that air escapes through the nose as well as the mouth. (They stand in contrast to oral vowels, which are ordinary vowels without nasalization, in which all air escapes through the mouth.) Nasalization refers to a situation whereby some of the air escapes through the nose during speech or singing.

In English, there is no phonemic distinction between nasal vowels and oral vowels, and all vowels are considered phonemically oral. However, vowels preceding nasal consonants tend to be nasalized; that is, vowels that are adjacent to nasal consonants are produced partially or fully with a lowered velum in a natural process of assimilation - see co-articulation, above - and are therefore technically nasal, though few speakers would notice. North American speech itself is characterized by a high degree of nasality, with sound partially emitted through the nose on nonnasals (low velar posture producing an open velopharyngeal port). The vowels in words such as ‘come’, ‘home’, ‘man’, ‘found’, and ‘sang’ tend to adopt the nasality of the succeeding or preceding nasal consonant.

Many students of voice tend to nasalize vowels that are adjacent to nasal consonants when they sing lines of text, although few are aware of it, likely because hypernasality is becoming increasingingly prevalent in popular music, so much so that they have become accustomed to hearing nasally tones from most of their vocal role models.

In the third paragraph of this article, I stated that a singer’s goal should be to learn to sing vowels without allowing the consonants to get in the way. Also, nasality, apart from that which naturally occurs during intended, intermittent nasal phonemes, is generally considered to be an undesirable tone in singing, and is a less acceptable and technically incorrect vocal element in most genres of music. Therefore, this tendency to nasalize vowels should be avoided during singing. (It should be recalled that opening up the nasal cavity during singing produces antiresonances, which rob the voice of its overtones and decrease natural volume.)

Even in languages that incorporate nasal vowels, such as French and Portuguese, the nasal quality is generally added only to the end of the phoneme or sung vowel so that resonance in the vocal tract can be maximized throughout the sustained note, and so that all active formants of the oral vowel can be present and strengthened. The introduction of nasality into the sustained nasal syllable is delayed or deferred until just before its termination, except where the note is very quick, which requires near immediate nasalization of the syllable. Even when nasality is necessary in order for correct pronunciation of the language, a very small amount of nasality will suffice.

With my students, we work through lines of text and analyze the quality of each vowel being sung. We break down the individual phonemes to ensure that the core of the vowel - that is, the part of the vowel that is being sustained - is pure, rather than nasalized. Incorporating repertoire into technical training becomes particularly useful for the student, because successfully vocalizing on vowels alone during vocal exercises does not necessarily guarantee that a student will also successfully maintain the purity of the vowels during the singing of songs when consonants are added. Students tend to have different habits when singing complete words and lines of text than they do while singing single vowels. Natural linguistic patterns require the juxtaposition of all possible spoken phonetic combinations, as occurs in poetic texts set to music.

Even in technical training exercises, it is often beneficial for the singer to combine nasal consonants with vowels in order to encourage rapid and immediate velar shifting from open to closed nasal port or closed to open nasal port, as happens frequently during speech and during the singing of text. If unwanted nasality from an adjacent nasal consonant becomes added to the vowel, a singer can try occluding the nostrils between the thumb and the first finger (pinching the nose) whenever a vowel or nonnasal consonant is being sung. (The nasal cavity must remain open for the nasal consonants, though.) For example, if singing ‘maw’, the singer would pinch the nose at the conclusion of the ‘m’ and the beginning of the oral vowel. Variances in timbre and sensation between the nasal and nonnasal sounds can readily be identified. The singer should also feel an immediate lifting of the velum and closure of the nasal port. This sound and sensation should be memorized so that the singer can learn to consistently avoid nasalizing vowels.

For more information on how to eliminate undesirable nasality from your tone, please read the section on nasally tone in Good Tone Production For Singing.

R-Coloured Vowels

R-coloured vowels, also known as rhotic or rhotacized vowels, are heard in words such as the Midwestern American English pronunciation of ‘fur’ and ‘air’ and before a consonant as in ‘hard’ and ‘beard’. IPA hangs a little ‘r-hookdiacritic off of the symbol for an r-colored vowel, as can be seen in the English Vowels and Their IPA Symbols Chart. The English vowel may be analyzed phonemically as an underlying /ǝr/ rather than a syllabic consonant.

R-coloured vowels are also known as vocalic ‘r’s. (In English, the vocalic r occurs as an r-coloured vowel.) In the vocalic r, the rhotic segment (e.g., the [r] or [ɹ]) occurs as the syllable nucleus - the nucleus, (sometimes called its peak), is the central part of the syllable, most commonly a vowel - in words like ‘butter’ and ‘church’.

R-coloured vowels occur in a number of rhotic accents of English, like General American. These vowels are absent in ‘r-drop’ or non-rhotic dialects, such as those typical of the North American South and New England region, and Received Pronunciation in Great Britain. In these latter dialects, the preceding vowel is usually lengthened and often glides toward the central schwa sound – the schwa is a reduced vowel (acoustically changed or weakened) sound that usually occurs on unaccented or unstressed syllables (lightly pronounced) in words containing more than one syllable. Schwa is a very short neutral vowel sound, and like all vowels, its precise quality varies depending on the adjacent consonants. It is sometimes signified by the pronunciation ‘uh’ or symbolized by an upside-down, rotated 'e'.)

During at least part of the articulation of the vowel, an r-coloured vowel may have either the tip or blade of the tongue turned up - (this is known as a retroflex articulation, in which the tongue articulates with the roof of the oral cavity behind the alveolar ridge, and may even be curled back to touch the hard palate; that is, they are articulated in the postalveolar to palatal region of the mouth) - or with the tip of the tongue down and the back of the tongue bunched. (American English typically curls the tip of the tongue back towards the palate.) Both articulations of r-coloured vowels lower the frequency of the third formant.

R-coloured vowels are a particular challenge for singers because they can hinder the production of, as well as distort, pure vowels. Folk and Irish singers, performers of country music, as well as singers with certain regional accents, tend to have very pronounced r-coloured vowels while singing.

Many vocalists who would normally speak English with r-colored vowels will replace them with their non-rhotic equivalents when singing in English. Since the first and second formants of words like ‘bird’ are often very similar to those of the word ‘hood’, the singer can create a nicer tone by removing the ‘r-colouring’ from the vowel and singing the ‘UH’ sound, as in the latter word. The key to singing r-coloured vowels while avoiding the characteristic acoustical distortion and lowering of the third formant is to examine the vowel preceding the ‘r’, then attempt to sing this open vowel, adding the ‘r’ sound only at the end of the sung phoneme.

For some singers, the concept of softening their ‘r’’s to make them less pronounced works at creating a nicer sound while still maintaining diction.

Monophthongs and Diphthongs

The English language is made up of different types of vowels, including compound vowel structures that the student of voice needs to learn to sing as purely as possible.

A vowel sound whose quality doesn't change over the duration of the vowel is called a monophthong. Monophthongs are sometimes called ‘pure’ or ‘stable’ vowels. The words ‘hat’ and ‘not’ are examples of monophthongs.

A vowel sound that glides from one quality to another is called a diphthong. In most dialects, ‘boy’ is a diphthong.

The vowel sound in the word ‘my’ is a diphthong. Separating the sound into its composite vowels yields two distinct vowel sounds: ‘ah’ and ‘eeh’. When singing diphthongs, the first vowel sound is sustained primarily, and the second vowel sound is added at the very last second in order to make the word distinguishable. This technique helps to stabilize the vowel by allowing full resonance to be achieved on a single, pure vowel, since monophthong vowels have better spectral continuity.

Last updated on Sat Jan 1 22:20:28 2011