Assimilation
Assimilation is a connected speech process in which a sound changes to become more similar to a neighboring sound. It is driven by articulatory efficiency — the speech organs anticipate the next sound and begin adjusting early, or carry over a feature from the previous sound. The result is smoother, faster production at the cost of the original sound's identity.
Types
Regressive (anticipatory) assimilation — the most common type in English. A sound changes under the influence of the sound that follows it. The mouth "looks ahead."
- Place assimilation: "ten boys" → /tem bɔɪz/ (alveolar /n/ → bilabial /m/ before bilabial /b/); "ten girls" → /teŋ gɜːlz/ (alveolar /n/ → velar /ŋ/ before velar /g/)
- Voice assimilation: "have to" → /hæf tə/ (voiced /v/ → voiceless /f/ before voiceless /t/)
Progressive assimilation — a sound changes under the influence of the sound that precedes it. Less common in English but regular in morphology: the plural -s is pronounced /s/ after voiceless consonants (cats /kæts/) and /z/ after voiced consonants (dogs /dɒgz/).
Coalescent assimilation — two adjacent sounds merge into a single new sound that shares features of both. This is the most noticeable type for learners:
- /t/ + /j/ → /tʃ/: "don't you" → /dəʊntʃuː/, "what you" → /wɒtʃuː/
- /d/ + /j/ → /dʒ/: "would you" → /wʊdʒuː/, "did you" → /dɪdʒuː/
- /s/ + /j/ → /ʃ/: "this year" → /ðɪʃ jɪə/ (partial, varies by speaker)
- /z/ + /j/ → /ʒ/: "as yet" → /æʒ jet/ (less common, more formal registers resist it)
What Triggers Assimilation
Assimilation is gradient, not categorical. It occurs more frequently in:
- Fast, casual speech (relaxed register)
- High-frequency collocations and fixed phrases
- Across word boundaries where no pause intervenes
It occurs less in careful speech, emphatic pronunciation, or when a speaker is consciously articulating for clarity.
Teaching Implications
For listening, learners need to recognize assimilated forms. A Vietnamese learner hearing /wʊdʒuː/ needs to parse it as "would you" — if they only know the citation forms, this string is opaque. Exposure through natural-speed audio with transcript support is the most effective method.
For production, coalescent assimilation (/t+j/, /d+j/) is the highest-priority target because it is pervasive, rule-governed, and learnable. Drilling common phrases — "What do you...?", "Would you...?", "Don't you...?" — as chunks with their assimilated forms builds automaticity. Place assimilation (/n/ → /m/, /n/ → /ŋ/) is lower priority for production because it happens naturally as fluency increases.
Assimilation interacts closely with Elision — in a cluster like "handbag" /hæmbæg/, the /d/ is elided and the /n/ assimilates to /m/, both processes operating simultaneously. Teaching them together through Connected Speech gives learners a more accurate model of how English actually sounds.