The three main aspects of English pronunciation

aspects of english pronunciation

Why you should learn how to read phonetic transcriptions

When a teacher introduces a new English word, he or she often describes its pronunciation with the symbols within slashes, such as:

bat /bæt/

This is called a phonetic transcription. Phonetic transcription is needed because how a word is written does not necessarily reflect how it is pronounced. The letter ‘i’, for example, could be pronounced either as /aɪ/ as in “hi” or as /ɪ/ as in “hint”. Some words even confuse us further when their spellings give us a misapprehension about their pronunciation, i.e., “know” (where the sound /k/ doesn’t exist) or “enough” (where “ough” makes us feel that it should sound “əʊ”). In other words, there’s no one-on-one mapping between orthography and phonology in English, and due to this ambiguous relationship, we need a set of symbols that differ from those in the spelling system to denote how words are pronounced. Those symbols are called phonemes.

Technically speaking, a phoneme is the smallest unit that makes up English sounds. For example, the word “bat” /bæt/ is made up of three phonemes /b/, /æ/, and /t/. While there are 26 letters in the English Alphabet, there are 42 phonemes in the English sound system. Learning these phonemes, to some extent, is as important as learning the English Alphabet because when you look up a word in a paper dictionary, it helps you know how the word is pronounced, even without the audio. It also helps you fact-check what you hear, and zero in on the subtle differences in pronunciation between various words.

If you are a twelfth-grader, and you haven’t yet been able to read phonetic transcriptions, it’s time to set a goal of making it, for you’re definitely gonna need it for the rest of your life. 

Sounds of speech: vowels and consonants

Talking about phonemes, there are two predominant categories: vowels and consonants. What are vowels? Basically, a vowel is a sound that passes through the vocal tract without being obstructed by any organs. When we go to the hospital, and the doctor would like to see our throat; he asks us to open our mouth and say “ah”, the flow of air at that time goes from our lungs through the larynx and the mouth to outer space without any obstacles on its way. That’s an example of how a vowel is produced.

A consonant, on the other hand, is produced when the flow of air is obstructed by certain organs, be it your lips, your teeth, or your tongue. These organs that stand in the way of the airflow are called articulators.

Now, we can take a look at the articulators to see how a consonant is made. You can place a mirror in front of your open mouth while reading this to see it more clearly:

The articulators include the two lips (1), the teeth (2), the alveolar ridge (3), the hard palate (4), the soft palate or the velum (5), and the larynx (9), some of which belong to the upper part of the mouth, and thus are immobile. The lower part of the mouth, however, can move thanks to the muscles of the jaw and the tongue. The tongue can be divided into three parts, the tip (6), the blade (7), and the back (8). 

The three dimensions in classifying consonants

A sound is produced when there’s a flow of air going up from our lungs, through the larynx, and out through the mouth or the nasal cavity. If that flow of air is stopped by the closing lips, and then suddenly bursted when the lips open, it creates a /b/ or a /p/ sound. These two sounds are called bilabials, indicating that the articulators participating in this sound production process are the two lips, and this term also signifies the first dimension used in classifying consonants, place of articulation

In addition, the two sounds /b/ and /p/ are similar not only because they have the same place of articulation, but also because they are produced in the same way: stopping the airflow and suddenly releasing it, creating a sort of blip. The consonants produced in this way are called plosives or stops, signifying the second dimension in classifying consonants, manner of articulation.

There’s still one more thing to mention about the two sounds /b/ and /p/, which helps distinguish one from another. Now, try using the two fingers to touch the area on your neck below Adam’s apple. When the sound /p/ is produced, you cannot feel the vibration, yet when the sound /b/ is pronounced, you can. Well, the vibration comes from the fact that the two vocal chords of your larynx stay close to each other, hindering the airflow, not stopping it completely, but flapping like the drawn curtains hit by a strong wind. The sound produced with the larynx vibration, is called voiced, and the other is called voiceless. This is the third dimension in classifying consonants, voicing.

Apart from /b/ and /p/, there’s another bilabial, /m/. Whereas /m/ share the same manner of articulation with the other two, it is not produced with the same manner of articulation. Instead of letting the air flow out through the mouth, we let it pass through the nose, and hence, this /m/ sound is called nasal.  

With the above three dimensions, we can easily classify all the consonants in English, as shown in the following chart:

For more information, check out this post, which I have written before: Place of articulation of English and Vietnamese consonants.

The three dimensions in classifying vowels

When we pronounce the sound /iː/ as in “cheese”, we can feel that our jaw is raised, making the lower part of the mouth moving towards the upper part. At the same time, our tongue is raised towards the mouth roof. As a result, even though the flow of air still passes through the mouth without obstruction, the gap for it to break through is quite narrow. This is contrary to the manner when we pronounce the sound /æ/ as in “bad”, where the gap between the lower and the upper part of the mouth is large. This is the first dimension used in classifying vowels, tonge height. And whereas /iː/ is called a close vowel, /æ/ is an open vowel.

Now, comparing our tongue shape when we pronounce /iː/ as in “cheese” and when we pronounce /uː/ as in “boo”, we could see that when we say /iː/, we tend to raise our tongue to the front, whereas when we say /uː/, we pull our tongue backwards, letting the back of our tongue closer to the soft palate. The same also applies to the difference between /æ/ as in “back” and /ɑː/ as in “bark”. This is called tongue backness, the second dimension in classifying vowels.

At the same time, we also see that when we pronounce /iː/, our lips are spreaded, or the corners of our mouth are stretched to two opposite directions, but when we pronounce /uː/, our lips become rounded, or the corners of our mouth come closer to each other. This is called lip rounding, the third dimension in classifying vowels.

There’s also one more dimension called tenseness, indicated by the difference between /iː/ as in “sheep” and /ɪ/ as in “ship”, between /uː/ as in “kook” and /ʊ/ as in “cook”, between the schwa sound /ə/ as in “about” and the sound /ɜː/ as in “early”.

Tongue height, tongue backness, lip rounding, and tenseness are the four dimensions used in classifying English vowels.

Now, there’s one more category of vowels called diphthongs. Diphthongs are the sounds that consist of a movement or glide from one vowel to another. For example, /ɪə/ as in “fierce” starts with /ɪ/ and ends with /ə/. There are 8 diphthongs in English: /ɪə/, /eə/, /ʊə/ (ending with /ə/), /eɪ/, /aɪ/, /ɔɪ/ (ending with /ɪ/), /əʊ/, and /aʊ/ (ending with /ʊ/). 

With the consonants and the vowels presented as above, we have the following phonemic chart of English:

Now this is just one aspect of English pronunciation. There are also two other aspects called stress and intonation.


The nature of stress is simple enough: practically everyone would agree that the first syllable of words like “father”, “open”, “camera” is stressed, that the middle syllable is stressed in “potato”, “apartment”, “relation”, and that the final syllable is stressed in “about”, “receive”, “perhaps”.

What are the characteristics of stressed syllables that enable us to identify them? It is important to understand that there are two different ways of approaching this question. One is to consider what the speaker does in producing stressed syllables, and the other is to consider what characteristics of sound make a syllable seem to a listener to be stressed. In other words, we can study stress from the point of view of production and of perception; the two are obviously closely related but are not identical. The production of stress is generally believed to depend on the speaker using more muscular energy than is used for unstressed syllables. 

Many experiments have been carried out on the perception of stress, and it is clear that many different sound characteristics are important in making a syllable recognisably stressed. From the perceptual point of view, all stressed syllables have one characteristic in  common, and that is prominence. Stressed syllables are more recognised as stressed because they are more prominent than unstressed syllables. What makes a syllable prominent, then? At least four different factors are important: the loudness, the length, the pitch, and the quality of vowels in the syllable. We will elaborate on this topic in another blog post. 


Stress is often regarded as part of the suprasegmental phonology of English, and there’s another part called intonation. 

What is intonation? No definition is completely satisfactory, but any attempt at a definition must recognize that the pitch of the voice plays the most important part. Only in very unusual situations do we speak with fixed, unvarying pitch, and when we speak normally the pitch of our voice is constantly changing.

Now this area of phonology also deserves a decent complete writing on it. So, we can just skip this part.

Questions about pronunciation in the National High School Graduation Examination

Since the very first day we learned English at public schools, teachers have been teaching us pronunciation with minimal pairs, a pair of words that vary by only a single sound, such as “pan” and “pen”. This method of teaching pronunciation helps us differentiate the two or three sounds that are similar to one another. 

The knowledge I presented above is just an attempt to systematize the fundamental aspects of English pronunciation. It does not necessarily help you in testing, but it will provide you with a structure, or a framework, or a holistic view of English as a language, which might be useful for your self-study throughout the course of your life.

It’s worth mentioning, though, that knowing about a language and being able to talk about it with metalanguage doesn’t mean that you’re a proficient user of that language. In the National High School Graduation Examination, for example, it’s not important whether you pronounce a single English sound perfectly; it’s important whether you remember the pronunciation of a word. 

You must be familiar with the following question: 

Mark the letter A, B, C, or D on your answer sheet to indicate the word whose underlined part is pronounced differently from that of the rest in each of the following questions.

A. hear
B. clear
C. near
D. bear

(NHSGE 2014 – Code 296 – Question 1)

No matter how skillfully you pronounce the diphthongs /ɪə/ and /eə/, you cannot finish this task if you don’t know the pronunciation of these four words. This is why paying attention to the phonetic transcription of words in the dictionary is paramount, and learning pronunciation must necessarily be a part of learning vocabulary.  

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s