The Oxford Research Encyclopedia of Linguistics will be available via subscription on April 26. Visit About to learn more, meet the editorial board, or recommend to your librarian.

Show Summary Details

Page of

PRINTED FROM the OXFORD RESEARCH ENCYCLOPEDIA, LINGUISTICS ( (c) Oxford University Press USA, 2016. All Rights Reserved. Personal use only; commercial use is strictly prohibited (for details see Privacy Policy and Legal Notice).

date: 24 April 2018


Summary and Keywords

When the phonological form of a morpheme—a unit of meaning that cannot be decomposed further into smaller units of meaning—involves a particular melodic pattern as part of its sound shape, this morpheme is specified for tone. In view of this definition, phrase- and utterance-level melodies—also known as intonation—are not to be interpreted as instances of tone. That is, whereas the question “Tomorrow?” may be uttered with a rising melody, this melody is not tone, because it is not a part of the lexical specification of the morpheme tomorrow. A language that presents morphemes that are specified with specific melodies is called a tone language. It is not the case that in a tone language every morpheme, content word, or syllable would be specified for tone. Tonal specification can be highly restricted within the lexicon. Examples of such sparsely specified tone languages include Swedish, Japanese, and Ekagi (a language spoken in the Indonesian part of New Guinea); in these languages, only some syllables in some words are specified for tone. There are also tone languages where each and every syllable of each and every word has a specification. Vietnamese and Shilluk (a language spoken in South Sudan) illustrate this configuration. Tone languages also vary greatly in terms of the inventory of phonological tone forms. The smallest possible inventory contrasts one specification with the absence of specification. But there are also tone languages with eight or more distinctive tone categories. The physical (acoustic) realization of the tone categories is primarily fundamental frequency (F0), which is perceived as pitch. However, often other phonetic correlates are also involved, in particular voice quality. Tone plays a prominent role in the study of phonology because of its structural complexity. That is, in many languages, the way a tone surfaces is conditioned by factors such as the segmental composition of the morpheme, the tonal specifications of surrounding constituents, morphosyntax, and intonation. On top of this, tone is diachronically unstable. This means that, when a language has tone, we can expect to find considerable variation between dialects, and more of it than in relation to other parts of the sound system.

Keywords: tone, tone sandhi, underspecification, fundamental frequency, pitch, autosegmental theory, contour tones, tone languages

1. Examples of Contrastive Tone

A tone language can be defined as a language “in which an indication of pitch enters into the lexical realization of at least some morphemes” (Hyman, 2006, p. 229). As a first illustration of this phenomenon, consider the words in (1). They come from Sumi, a Tibeto-Burman language spoken in India, and the discussion is based on the descriptive analysis in Teo (2014). Amos Teo also provided the sound examples.


a. /m̀.là/ ‘to work’

b. /m̀.lā/ ‘to foam’

c. /m̄.lá/ ‘to be easy’

Audio. Recording of (1a), embedded in the sentence /nòjē __ pī pì ànī/ ‘You said _’.

Audio. Recording of (1b), embedded in the sentence /nòjē __ pī pì ànī/ ‘You said _’.

Audio. Recording of (1c), embedded in the sentence /nòjē __ pī pì ànī/ ‘You said _’.

In Sumi in general, each syllable is specified for one of three distinctive tone patterns: Low, Mid, or High. The Low, Mid, and High tones are transcribed with a diacritic on the nucleus or core of the syllable, for example, à, ā, á, respectively—a summary of conventions to transcribe tone patterns follows in Section 8, “The Transcription of Tone.” The three verb morphemes in (1) are identical in terms of segmental composition (/, where the dot marks a syllable boundary), but they diverge in terms of the specification for tone. And while Teo transcribes a tone on the first syllable—the syllabic nasal/m/—he notes that the specification on the penultimate syllable is not distinctive independently of that of the final syllable of the morpheme (Teo, 2014, p. 45). In this final syllable, we find a Low tone for ‘to work’, a Mid one for ‘to foam’, and a High tone for ‘to be easy’. These three morphemes constitute a minimal set for tone: a group of words in which tone is the only feature that distinguishes them, while all other specifications are the same. Teo (2014, pp. 48–53) presents measurements of fundamental frequency (f0), the acoustic correlate that most determines the perceived pitch of a syllable. These measurements show that the Low, Mid, and High tones are realized roughly equidistantly in the phonetic space.

A second example illustrating tonal contrast is drawn from Shilluk, a Nilo-Saharan language spoken in South Sudan. In this language, tone plays an important role in the morphology, in addition to its lexical function. That is, while tone distinguishes unrelated morphemes, it also has high functional load in morphological paradigms. Table 1 shows extracts from the paradigms of two transitive verbs. These two verbs are identical in terms of their segmental composition. There are eight distinctive tone patterns in total in Shilluk, all of which are illustrated in Table 1: the level tone patterns Low (cv̀c), Mid (cv̄c) and High (cv́c); a Rise (cv̌c); and no less than four different falling contours: Low Fall (cv̂c), High Fall (cv̂́c), Late Fall (cv́c̀), and High Fall to Mid (cv́c̄). As seen from Table 1, the basic past tense and the past centrifugal involve the same tone pattern on the stem syllable for the two verbs: High Fall and Late Fall, respectively. The remaining three past-tense inflections in Table 1 are also marked through tone, but in a way that is specific to verb class, that is, they are marked differently for the Low and High verb classes. Note that, in each of the three cases, the High verb has a tonal specification that involves higher f0 than the corresponding Low verb; it is on this basis that the classes can be labeled Low and High. For example, the applicative is marked by a Mid tone on the Low verb and by a High Fall to Mid on the High verb.

Table 1. Illustrations of the Lexical and Morphological Specifications for Tone in Two Segmentally Identical Shilluk Verbs, Representing Two of the Seven Classes of Transitive Verbs

Lexical verb class:

low class

high class

Example: Inflection:

{lɛŋ} ‘take away gradually’

{lɛŋ} ‘drum’


a. á-lɛ̂́ŋ

f. á-lɛ̂́ŋ

past 2sg

b. á-lɛ̀ŋ

g. á-lɛ̂ŋ

past applic

c. á-lɛ̄ŋ

h. á-lɛ̂́ŋ̄

past applic 2sg

d. á-lɛ̌ŋ

i. á-lɛ́ŋ

past centrifugal

e. á-lɛ́ŋ̀

j. á-lɛ́ŋ̀

Audio. Recording of Table 1.a: á-lɛ̂́ŋ.

Audio. Recording of Table 1.b: á-lɛ̀ŋ.

Audio. Recording of Table 1.c: á-lɛ̄ŋ.

Audio. Recording of Table 1.d: á-lɛ̌ŋ.

Audio: Recording of Table 1.e: á-lɛ́ŋ̀.

Audio. Recording of Table 1.f: á-lɛ̂́ŋ.

Audio. Recording of Table 1.g: á-lɛ̂ŋ.

Audio. Recording of Table 1.h: á-lɛ̂́ŋ̄.

Audio. Recording of Table 1.i: á-lɛ́ŋ.

Audio: Recording of Table 1.j: á-lɛ́ŋ̀.

2. Methodology and Theory in the Study of Tone

The fundamentals of the methodology for the first-hand investigation of tone were laid out in a comprehensive manner almost 70 years ago, in Pike (1948), and in particular in the didactically oriented first part of this monograph. Recently, Yu (2014) has presented a reappraisal of this seminal work, highlighting its sound experimental orientation. In a nutshell, Pike advises to put together sets of words that are comparable in terms of both sound patterns and grammatical characteristics— for example, disyllabic nouns consisting of open syllables with sonorant onsets. By keeping other factors constant within each set, any salient difference in melodic pattern is likely to be indicative of lexical specification. These sets of words are then elicited in a range of contexts. The value of this approach is evident from the Sumi example above. The three members of the minimal set are all verbs, and the frame sentence is fixed. Given so, it is highly likely that the melodic differences observed reflect the phonological specification of the morphemes. Snider (2014) provides didactic guidance related to this topic, emphasizing the distinction between surface-phonological and underlying specification. I will his fundamental distinction using an example from Mixtec, an Otomanguean language spoken in Mexico. The data and analysis is taken from McKendry’s (2013) analysis of the Southeastern Nochitlán variety of the language, and Inga McKendry also provided the sound examples. When the two words in (2) are pronounced in the citation form, both have a Mid tone on both syllables. This is the surface-phonological specification of the two words.


a. /nāʔā/ ‘hand’

b. /β‎ɛ̄ʔɛ̄/ ‘house’


Audio. Recording of (2.a): /nāʔā/.

Audio. Recording of (2.b): /β‎ɛ̄ʔɛ̄/.

When they are uttered as the initial term in a possessive noun phrase, that is, as the possessed term, they still have a Mid tone on each syllable (3). But note how, in that context, these two words affect differently the specification for tone on the following vowel, the first vowel of the possessor noun. The first syllable of ‘coyote’ has a High tone when following/naʔa/ ‘hand’, but a Mid tone when following/βɛʔɛ‎/ ‘house’. This suggests, that, in (3a), the High tone on the first syllable of /jajan/ ‘coyote’ is somehow to be attributed to the morpheme/naʔa/ ‘hand’. This is illustrated by the sound examples. Importantly, and perhaps counter-intuitively, the citation forms of the nouns in (2) are not, by themselves, a reliable guide to the specification for tone. We will soon come back to the question of how the specification for tone can be represented.


a. /nāʔā jájàn/

b. /β‎ɛ̄ʔɛ̄ jājàn/ ‘house’


‘paw of the coyote’

‘house of the coyote’

Audio. Recording of (3.a): /nāʔā jájàn/.

Audio. Recording of (3.b): /β‎ɛ̄ʔɛ̄ jājàn/.

The conclusion that surface-phonological specification is not necessarily the same as underlying or lexical specification holds across tone languages. That is, while there are languages in which the underlying specification for tone of morphemes can be inferred directly from the surface-phonological form in any context, including the citation form, it is quite common for the situation to be more complex, to the effect that the observed or surface-phonological specification cannot be assumed to be identical to underlying specification. Because of this, the analysis of a tone language is an inductive process, whereby a researcher finds out the surface-phonological specification of a morpheme in a range of contexts, before inferring the underlying specification and the processes from which the surface patterns can be derived. The methodology can be enhanced by recording the forms, which enable the researcher to listen across speakers and contexts, and examine f0 traces. Further guidance on the first-hand investigation of tone can be found in the papers in Bird and Hyman (2014).

When it comes to the phonological analysis of tone, autosegmental theory has been the main theory for several decades, its key assumptions and insights being widely adopted. Central to the autosegmental approach to tone is the understanding that tone is best interpreted as an autonomous dimension of speech, and therefore best represented separately from the consonantal and vocalic composition of a morpheme (Leben, 1973; Goldsmith, 1976; Williams, 1976). The first to articulate the key principles of autosegmental theory is Williams (1976), which has been available in manuscript form from 1971 onwards. He hypothesizes that tones are a feature of morphemes, rather than of phonemes or syllables. He also postulates a language-independent Tone Mapping rule, which “maps from left to right a sequence of tones to a sequence of syllables” (Williams, 1976, p. 469). Williams’ approach informed Leben (1973) and Goldsmith (1976). The theory has been developed in numerous ways since then. One of these developments is the recognition that the mapping of tones to morphemes is not the same in all tone languages. The linking may simply be lexically specified. Or the mapping may be regular, but it may not start at the left edge of the word (Clements & Ford, 1979).

To illustrate how autosegmental theory can be used to deal with the autonomy of tones, we return to the Mixtec phenomena introduced above in (2, 3). Illustration (4) shows the autosegmental representation of the two nouns in (3a), as postulated in McKendry (2013). Illustration (4a) shows the hypothesized underlying specification of the two nouns. McKendry postulates that/naʔa/ ‘hand’ has a Mid tone and a High tone, and that/jajan/ ‘coyote’ has a Low tone and a High tone. These tones are represented on a tonal tier, which is separate from the segmental sequence and connected to it through association lines. The analysis hinges on the understanding that the Mixtec tones are linked with vowel units,1 in such a way that the first tone of a lexical item is linked with the second vowel unit, and that tones associate rightwards from that point (cf. Clements & Ford, 1979). In this way, the first tone that is part of the lexical specification of/naʔa/, the Mid, is associated with its second vowel unit, as seen in (4a). Still in the citation form of/naʔa/(4a), the High tone lacks a vowel it can associate with; it remains unassociated (‘floating’), and there is no trace of it in the phonetic realization.



In contrast, when there is a following noun (4b), the High tone of/naʔa/ associates across the word boundary, with the initial syllable of/jajan/, which is itself toneless. This is expressed by the interrupted line in (4b). In the same vein, the High tone of/jajan/is also a floating tone; it is not realized, for lack of a vowel unit to dock onto. Finally, the initial syllable of/naʔa/gets a Mid tone associated with it by default.

In the same way, the Mid tone on the initial vowel unit of/jajan/in (3b) is attributed to a floating tone linked rightwards from/βɛʔɛ‎/ ‘house’, which has a Mid-Mid specification for tone. And here again, the Mid tone on the initial vowel in (2b) and (3b) is attributed to a default insertion. This analysis correctly predicts that the melodic pattern of a given Mixtec word depends in part on the lexical item that precedes it.

It is the autonomous nature of tonal phenomena like these that has led to the development of autosegmental theory; we need the flexibility afforded by a separate tonal tier, because of the observed autonomy of tones relative to the segmental sequence. Across the world’s languages, it is not uncommon for tones to cross word boundaries, and it is phenomena like these that make tone an area of phonology that displays particular complexity. In the words of Hyman (2013, p. 16): “[T]one can do everything segments and non-tonal prosodies can do, but segments and non-tonal prosodies cannot do everything tone can do.”

In another important contribution to the development of segmental theory, Leben (1973) used this autonomous interpretation of tone to decompose contour tones, that is, tone configurations involving more than one target within a syllable, into level tones targets. That is, a Falling tone can often best be interpreted as a sequence of High and Low tone targets, on the basis of evidence from the tonal phonology (cf., Yip, 1989).

3. The Typology of Tone Systems

Tone systems vary greatly in the extent to which syllables and morphemes are specified for tone. In Shilluk, each morpheme is specified for tone. Note, however, that it is not unusual for only part of the lexicon to involve tonal specification. Swedish, for example, is a tone language in which a substantial proportion of the lexicon is not specified for tone. The contrast is illustrated by the minimal set in (5). The patterns in this language are traditionally referred to as word-accent I and word-accent II. The privative analysis of the Swedish tone system presented here is the one advanced for Central Standard Swedish dialect by Riad (1998, 2006), which builds on evidence in Engstrand (1995, 1997).


/ˈmilan/ ‘Milan’ (word-accent I)

/ˈmílan/ ‘the charcoal stack’ (word-accent II)

Note that the transcription of the word with word-accent I does not involve a lexical specification at all, and that the word with word-accent II involves lexical specification on the stressed syllable only, where Riad postulates a High lexical tone.2 In other words, the Swedish tone contrast is hypothesized to be of a privative nature: a specification for tone contrasts with the absence of specification. Section 6, “Tone and Intonation” will come back to the Stockholm Swedish tone system, because intonation plays an important role here alongside tone. But the point to consider at this point is that a tone language does not necessarily have a lexical specification for tone on every morpheme, nor on every syllable. This issue, of the status of tone in systems where the specification is limited, has been a central controversy to the typology of tone systems.

Earlier on, the definition of a tone language was cited as “one in which an indication of pitch enters into the lexical realization of at least some morphemes” (Hyman, 2006, p. 229). In contrast, in the middle of 20th century, this concept of a tone language was defined in a more restrictive manner. Pike (1948), in his seminal work on the study of tone, defines a tone language as having “lexically significant, contrastive, but relative pitch on each syllable” (Pike, 1948, p. 43). This definition is clearly much more restrictive. Stockholm Swedish, where most morphemes are toneless and where the morphemes that are specified for tone only have a specification on one syllable, is a tone language according to Hyman’s definition, but not according to Pike’s definition. As it turns out, it is very common for the lexical specification for tone not to cover each and every morpheme or syllable. Even Mandarin Chinese, the first example of a tone language that comes to mind for most people, has toneless syllables. Chen (2000, p. 58) describes a relative clause marker/de/, the tone pattern of which is determined by the specification of the syllable to its left. So whereas/mǎ/ ‘horse’ has a low-rising contour in utterance final position, the final high target of this configuration appears on the toneless function morpheme/de/, i.e.,/mà dé/ ‘of the horse’. Sumi, the Tibeto-Burmese language introduced in Section 1, “Examples of Contrastive Tone,” presents a similar phenomenon (Teo, 2014, p. 69).

In Mandarin and Sumi, toneless morphemes constitute a minority. But there are also many languages like Stockhom Swedish, in which toneless morphemes are very common, including Basque (Hualde, Elordieta, Gaminde, & Smiljanić, 2002), several varieties of Japanese (Haraguchi, 1977), and Ekagi (Hyman & Kobepa, 2013), a Papuan language spoken in Indonesia. If we adopt Pike’s definition, there is the problem of how to conceive of these tone systems in which the specification is sparse. One approach has been to treat these restricted tone system as a typologically separate class, known as “lexical pitch-accent” languages (e.g., van der Hulst & Smith, 1988). The difficulty with this approach is that there is no obvious cut-off point between a densely specified and a sparsely specified system. One of the first to make this point was McCawley (1978). He points this out in relation to several languages, including dialects of Japanese that diverge in the extent to which syllables are specified from tone. The Tokyo dialect presents a purely privative contrast: some words have a single syllable per word specified for tone, whereas other words are toneless, that is, they have no such specification. This contrast is similar, in a sense, to the one of Swedish. However, there are also dialects with more richly specified inventories, such as the Osaka dialect. Given so, it is not insightful to draw a fundamental typological divide here. Hyman’s (2006) definition is to be seen as an attempt to define the phenomenon of tone in a manner that is maximally inclusive. In this perspective, restrictions on the specification of tone are conceived of as natural characteristics of the phenomenon tone, rather than as characteristics that make the phenomenon less tonal in nature.

At the other end of the typological range are densely specified tone systems with rich inventories. In terms of level tones, there are various languages that present five contrastive tone levels, that is, tones involving a single melodic target—see, for example, Wedekind (1983) on Benchnon, and Kuang (2013) on Black Miao. Illustration (6) displays a minimal set example of the five-way contrast of level tones in Black Miao. The sound examples were provided by Jianjing Kuang.


a. pà̀

b. pà

c. pā

d. pá

e. pá́

‘drive away’





Audio. Recording of (6.a) /pà̀/.

Audio. Recording of (6.b) /pà/.

Audio. Recording of (6.c) /pā/.

Audio. Recording of (6.d) /pá/.

Audio. Recording of (6.e) /pá́/.

In terms of contour tones, that is, tone categories whereby two targets are associated with a syllable, it has recently become clear that the timing, also known as the alignment, can be distinctive within the syllable domain (DiCanio, Amith, & Castillo García, 2014; Remijsen, 2013; Remijsen & Ayoker, 2014). Table 1 presents two minimal-set examples of this from Shilluk, in the contrast between the High Fall—which is early-aligned—and the Late Fall—which involves the same tone targets but is late-aligned. Leaving aside specific dimensions of contrast, it is clear that there numerous languages with more than eight tone categories. Examples include Iau (Bateman, 1990) on Iau, a language spoken on New Guinea, and the Otomanguean languages San Quiahije Chatino (Cruz, 2011; Sullivant, 2011) and San Martín Itunyoso Trique (DiCanio, 2009a), both spoken in Mexico.

4. Factors Affecting the Distribution of Tones

The tone system of Sumi, introduced in Section 1, “Examples of Contrastive Tone,” has Low, Mid, and High tone categories. If, in disyllabic words, all of the logically possible combinations would be attested, there would be a total of nine different patterns: Low on initial syllable followed by either Low, Mid, or High on the final one; and likewise three patterns each with Mid and High on the initial syllable. In fact, the inventory is more restricted, at least in disyllabic words that consist of a single morpheme. All three of the tones appear on the final syllable, but tone is not distinctive on the initial syllable. Instead, there is Low-Low, Low-Mid, and Mid-High. This type of restriction is not uncommon; most tone systems present some limitation on the inventory of tone patterns. Mian, a Papuan language spoken in Papua New Guinea, like Sumi allows for a greater inventory in word-final position than elsewhere: two tone targets can appear on a word-final syllable, but only one in a non-final syllable (Fedden, 2007, p. 71). In both languages, then, the complete inventory is only found in word-final position. The explanation for this is that the word-final syllable is an environment that offers more time to realize the tone patterns on, as it tends to have a greater duration (cf. Teo, 2014, p. 54). The other languages discussed in the previous sections reveal other factors limiting the distribution. In Shilluk, the full inventory of eight distinctive categories is only found in monosyllabic stems. Affixes and function morphemes only carry Low, Mid, or High. Here as well, time pressure is at issue: stem syllables, on which the full inventory appears, are closed (that is, they include a coda consonant) in Shilluk, whereas affixes tend to be open and short. And in Stockholm Swedish, the lexical tone is only found on stressed syllables, again an environment that involves greater duration.

The influence of factors relating to duration on the distribution of tones is particularly clear in relation to contour tones. Contour tones are tone patterns that involve more than one target on a single syllable. Rising contour tones involve a lower target followed by a higher one; falling contour tones, a higher target followed by a lower one. Contour tones are distinguished from level tones, such as Low, Mid, and High, which involve a single target only. Zhang (2002) presents a survey of 187 languages that have contour tones. He finds that, in the majority of these, there are limitations on the environments in which they can appear, and that what matters is not so much duration in general but rather sonorous duration, the time domain on which tone patterns can be realized phonetically.

In addition to the factors already mentioned above (word-final position, stress, closed syllable), there is also vowel length. Given the relevance of sonorous duration to the distribution of tone patterns, it does not surprise that many languages allow for a wider range of tone patterns on syllables that have a long vowel than on those that have a short vowel. The interaction of tone with vowel length is illustrated in (7), on the basis of Stegen’s (2002) investigation of Rangi, a Bantu language of Tanzania. At the syllable level, Rangi has Low, High, Fall, and Rise tone categories. But the two contour tone patterns, Fall and Rise, are only found on syllables with a long vowel (Stegen, 2002, p. 132).


mʊ̀kʊ̀vʊ̀ ‘navel’

ìbâandà ‘hut’

ìbǎahɪ̀rà ‘feather’

ìbátà ‘duck’

bàankà ‘room’

mʊ̀tɪ̂ɪkɔ̀ ‘ladle’

In autosegmental theory, this interaction between vowel length and the inventory of tone patterns is represented with reference to the notion of the tone-bearing unit (see e.g., Odden, 1995). The tone-bearing unit (TBU) is the constituent with which the tonal units are associated. If the internal constituency of the syllable is irrelevant to the association of tone patterns in a given language, then the syllable is postulated to be the TBU. If instead the association hinges on the internal composition of the syllable, then those elements that matter are postulated to be the TBU. This approach fits well with the Rangi data introduced in (9). Stegen (2002) postulates that short vowels count for one abstract weight unit, or mora, and long vowels for two such units. Contour tones are hypothesized to be composed of High and Low tone targets, which associate with these moras. It follows then that a syllable with a long vowel can accommodate a contour (two-target) tone category, whereas a syllable with a short vowel cannot. This analysis, which is the one postulated in Stegen (2002) for Rangi, is illustrated in the representation in (8). The second syllable of the word meaning “ladle” has a long vowel and therefore projects two mora, that is, it is bimoraic. Hence two tone targets can be associated with it, H and L, which surface as a Falling tone.



Importantly, it is inferred from the language phenomena which elements are moraic. For example, in case contour tones would equally be allowed in Rangi on syllables that have a sonorant coda, then these constituents would be postulated to project a mora.

The fact that sonorous duration is so important to tonal realization relates to the fact that there are phonetic limitations on the production and perception of tone patterns. On the production side, there is the time required to implement a change from one tonal target to another. The main phonetic correlate of tone patterns is the fundamental frequency (f0) that results from vocal fold vibration. Xu and Sun (2002) carried out an experiment in which speakers produced alternations between low and high f0 as fast as possible. They found that the speed of f0 change is high in transition, but much slower at the turning points. As a result, realizing a fall in f0 of around four semitones (ST; for example, 150 to 120 Hz), which is not uncommon in realizing a Low tone after a High one, takes 124 ms on average—a time interval that may exceed the duration of a vowel in many environments.

There are also perceptual limitations on the realization of tone patterns over constituents that are short in duration (‘t Hart, Collier, & Cohen, 1990; House, 2004). In relation to contour tones, the most important one is the glissando threshold, which reflects the hearer’s ability to perceive f0 changes as pitch contours rather than as pitch levels (‘t Hart, Collier, & Cohen, 1990). This threshold is determined by the size and duration of the f0 change. Concretely, for a change in f0 to be perceived as a change in pitch, the shorter the duration over which an f0 change is realized, the greater in excursion it needs to be. This limitation conspires with the speed of f0 change make contour tones problematic when sonorous duration is limited.

5. Contextual Tone Processes

In speech, the tone patterns of morphemes tend to appear in sequence with adjacent tonal specifications. The succession is fast: syllables will often be uttered at a rate of five per second. As a result, the issue of time pressure—introduced in Section 4, “Factors Affecting the Distribution of Tones,” in relation to the inventories of tone at the level of morphemes—equally affects their implementation in words, phrases, and utterances. A study on Mandarin Chinese by Xu (1999) makes clear that tone targets realized in sequence are reached relatively late in the syllable. This means that, following a dissimilar target, a High tone will typically be realized as a high-rising f0 shape, and a Low tone as a low-falling pattern. For the same reason, f0 at the very beginning of a syllable—in a language with dense specification—is often at a level determined largely by the specification for tone of the previous syllable. For example, one of the tone categories of Mandarin Chinese, a rising contour tone, typically reaches its High target in the following syllable (Xu, 2001).

The effects of time pressure help to explain why a morpheme may appear with a phonological specification for tone that is different from its lexical specification, as a function of the phonological specification of the (tonal) context. Such contextual tone processes—also known as tone sandhi—are quite common in tone languages. The Mixtec example in Section 2, “Methodology and Theory in the Study of Tone,” is a case in point: the phonological specification of a High tone on the first syllable of/jájàn/ ‘coyote’ in (4b) can be attributed to the preceding word. It is plausible to hypothesize that this sandhi process has its diachronic origin in perseveratory phonetic phenomena of the kind reported in Xu (1999, 2001). Examples from Dinka, a Nilo-Saharan language spoken in South Sudan, illustrate tone sandhi further. Most Dinka dialects have four tone categories: Low, High, Fall, and Rise. The tones are associated with the final syllable of content morphemes, which in most morphemes is the only syllable. The Fall of Dinka is a contour tone, involving a sequence of two targets, the kind of configuration that involves greater time pressure than level tones. In several dialects, we find a tone sandhi process that could be labeled as “HL simplification”. This process systematically turns the Fall into a High, whenever it is not at the end of the utterance. This is illustrated in Figure 1, based on data from the B or dialect of Dinka. In these displays, f0 is displayed by the red line overlaid on the waveform. Panel A (left) shows the realization of word ŋâaap ‘sycamore’ at the end of the utterance, where its lexically specified Fall is revealed. In Panel B (right), in contrast, the same word is not in utterance-final position, and now it is High-toned.

ToneClick to view larger

Figure 1. An illustration of HL Simplification in the Bor dialect of Dinka. Panel A: Context illustrating the underlying specification of /ŋâaap/. Panel B: Context illustrating the application of HL Simplification.

Audio. Recording of Figure 1, panel A: /ǎ-nɔ̀ŋ ŋâaap/.

Audio. Recording of Figure 1, panel B: /bôol ěe ŋâaap máaan/.

Illustration (9) shows a schematic representation of this process using autosegmental theory. This representation hinges on the interpretation of the Fall as a sequence of a High and a Low target. The Low target is deleted if the syllable with which the Fall is associated is immediately followed by another syllable (represented as σ‎ [sigma]), without an intervening utterance boundary. As a consequence of this process, the contrast between Fall and High tone categories in underlying specification is neutralized in non-final contexts. This means that, if Dinka had a noun/ŋáaap/(to the best of my knowledge, it does not), it would be indistinguishable from from /ŋáaap/in the context displayed in the right-hand panel of Figure 1.



For the sake of comparison, Figure 2 shows data from another dialect, Nyarweng. This dialect faithfully renders the underlying Fall in utterance-medial contexts in the surface phonology. This can be seen in Figure 2A (left), which shows the same word ŋâaap ‘sycamore’ in a sentence that is comparable to the one in Figure 1B, pronounced by a speaker of Nyarweng Dinka. The relevance of this tone sandhi process to the issue of time pressure is illustrated by the Narweng utterance in Figure 2B (right). Whereas the noun ŋâaap has an overlong vowel, the final syllable in abî̤ɲ ‘cup’ has a short vowel. Note how the low target of the Fall is only implemented well into the onset/m/of the following word, similar to Xu’s findings relating the realization of the Tone 3 of Mandarin. In summary, while there is plenty of evidence that time pressure plays a role in tone systems, it is evidently not deterministic in absolute manner.

ToneClick to view larger

Figure 2. The effect of vowel length on the phonetic realization of the Falling contour tone (HL) in the Nyarweng dialect of Dinka. Panel A: HL on /ŋâaap/ (overlong vowel); Panel B: HL on /abî̤ɲ/ (short vowel).

Audio. Recording of Figure 2, panel A: /dèeŋ à-cí̤ ŋâaap máaan/.

Audio. Recording of Figure 2, panel B: /dèeŋ à-cí̤ abî̤ɲ máaan/.

One of the best-known sandhi processes is Meeussen’s Rule, which turns a High tone into a Low tone, when it follows another High tone. It is found in many Bantu languages (Kenstowicz, 1994), but also in other language families. It is illustrated here on the basis of another dialect of Dinka, Luanyjang. The evidence is presented in (10). The word/aɲáaar/“buffalo” has a High tone, as seen from (10a) and the associated sound example. Illustration (10b) shows how it is affected by Meeussen’s Rule. In this sentence,/aɲáaar/is both preceded and followed by words that have the High tone specification. In this context, the High tone of/aɲáaar/makes the High tone on the following word/máaan/dissimilate to a Low tone. The fact that máaan has a High tone underlying can be verified in (11c), where it is preceded by a Low-toned noun.










(surface phonology)





(underlying phonology)




hate. inf

‘Achol hated a buffalo’.






(underlying = surface phonology)




hate. inf

‘Achol hated grass’.

Audio. Recording of (10.a): /aɲáaar/.

Audio. Recording of (10.b): /acôol à-cí̤ aɲàaar màaan/.

Audio. Recording of (10.c): /acôol à-cí̤ nòoon máaan/.

Going back to (10b), the dissimilation process has been applied twice: /aɲáaar/itself becomes Low-toned, because of the High tone of the preceding past-tense marker cí̤. The fact that the intervening syllable, the initial syllable of aɲáaar, is toneless is evidently irrelevant to the application of the rule. Given so, the dissimilation rule can be represented as in (11) without reference to the associations with tone-bearing units. Its application hinges solely on the composition of the tonal tier.


  • H -> L / H __

When it comes to contextual tone processes above the word level, many tone languages show tones appearing on syllables that are some distance away from the morpheme they are lexically related with. Patin (2009) describes this phenomenon in relation to Saghala (Bantu, Kenya), from which both the data and the analysis are drawn. To begin with, tones in Saghala surface one syllable to the right of the syllable with which they are lexically associated. This process of Tone Shift is illustrated by the verb forms in (12). The verb form in (12a) is toneless. When the object-marking affix/-zi-/is added, a tone appears on the following syllable. So whereas/-zi-/ “sponsors” a High tone (marked by underlining), this High tone surfaces on the following syllable.


a. ni-ɣul-aɣ-a ‘I will buy’

ni-zi-ɣúl-aɣ-a ‘I will buy it’

A second process is Tone Spreading. In various contexts, the associated High tone will then spread rightwards up to the first tone-bearing unit of the following word, as shown in (13). Illustration (13a) shows that the word nɟovu ‘elephant(s)’ and mbwaa ‘big’ are both toneless. Illustration (13b) shows how a High tone sponsored by a demonstrative spreads across one word boundary. Here Tone Shift yields the High tone on /izí/, which then spreads across the word boundary as a result of Tone Spreading. And, in (13c), the same two processes result in a High tone target on /mbwáa/, two words to the right of the word containing the sponsor.


a. nɟovu bwaa

b. izí nɟóvu

c. ilja nɟóvú mbwáa

‘big elephant’

‘these elephants’

‘that big elephant’

Phenomena like the one in Saghala suggest that sandhi processes are ordered—the analysis hinges on High Spreading applying after the application of the High Shift. Another important insight from this example is that it presents evidence for syllables that are not themselves specified for tone, that is, underspecification. As explained in Section 3, “The Typology of Tone Systems,” it is very common for tone languages to present morphemes that are not lexically specified for tone. Such toneless syllables may surface tonelessly, they may get a tone as a result of a contextual process, as in Saghala, or, as we will see below in Section 6, “Tone and Intonation,” they may get an intonational tone. Importantly, we cannot assume that the realized melodic pattern reflects lexical specification.

6. Tone and Intonation

Tone refers to the specification of melodic patterns at the level of morphemes. In other words, these tones are part of the lexical specification of morphemes. In terms of the hierarchy of prosodic constituents, they are situated below or at the level of the phonological word. Above the word level, speech is structured into phonological phrases and utterances. These constituents may contribute their own tone targets, known as intonational tones. Crucially, intonational tones are not part of the lexical specification of morphemes. This section addresses the issue of how the lexical tones interact with intonational specification of tone targets.

To begin with, lexical tone and intonational tone are not mutually exclusive in a given language (cf. Gussenhoven, 2004). This is illustrated using the Stockholm Swedish contrast between word-accent I and word accent II, introduced in Section 3, “The Typology of Tone Systems." As noted in (5), the lexical specification of a High tone in accent II words like /ˈmílan/ contrasts with the lack of lexical specification in accent I words like /ˈmilan/. Figure 3 presents acoustic data (f0 traces). Note how the melody on word-accent I /ˈmilan/ does not sound flat: the pitch sounds high or rising of the first syllable, and low or falling on the second. The f0 trace in Figure 3A is in line with this pitch impression. And when listening to the sound example illustrating word-accent II /ˈmílan/, linked with Figure 3B, you will find that the final syllable has a salient pitch pattern to it, which is visible in the f0 trace.

ToneClick to view larger

Figure 3. F0 trajectories illustrating the Stockholm Swedish tone contrast in a minimal set. Uttered in utterance-final position in: Det var [target word]. ‘It was [target word]’. Based on sound data from Olle Engstrand. See also footnote 2.

Audio. Recording of Figure 3A: Det var Milan (word-accent I).

Audio. Recording of Figure 3B: Det var mílan (word-accent II).

So where do these additional high f0 targets—at the end of the first syllable of word-accent I, and on the second syllable of word-accent II—come from, if they are not part of the lexical specification? The answer is intonation. Both of these peaks are only present in particular contexts, relating to discourse prominence and information structure. They are not there in a context where the word is not marked for discourse prominence. In contrast, the lexical High target on the initial syllable of word-accent II / ˈmílan / is present across contexts (Engstrand, 1995, 1997).

Illustration (14) shows autosegmental representations that account for the f0 patterns in Figure 1, following the analysis of Riad (2006). Both lexical and intonational tones are associated with the syllables. First, there is the lexical High tone on the first syllable of /ˈmílan/. All the other associated tones are intonational. The key concepts in relation to intonational phonology are prominence tones and edge tones (Pierrehumbert, 1980). In Swedish, prominence tones and edge tones are found both on the words that do not have a lexical specification and on words that do. One of the intonational tones is the High prominence tone. Such tones are also known as intonational pitch-accents, and they are marked with an asterisk, as in LH*. It is associated with the first syllable of /ˈmilan/ and with the final syllable of /ˈmìlan/. The other intonational tone is the Low boundary tone, which is found at the end of statements. These are also known as edge tones and marked with a percentage sign, as in L%.



Across the world’s languages, intonational pitch-accents such as the prominence tone of Stockholm Swedish are typically associated with syllables that have lexical stress, that is, syllables that are prominent at the level of the prosodic word. This is what we find in word-accent I /ˈmilan/ ‘Milan’. But note how the pattern of association deviates in the case of word-accent II /ˈmílan/; here, the stressed syllable has the lexical tone, and the prominence tone (LH*) is associated on the following syllable instead. In this way, tone interacts with intonation in Stockholm Swedish: the way the intonational tone is associated depends on the presence or absence of a lexical tone. The L% edge tone associates with the final syllable in the utterance, resulting in a contour, in case this syllable also has the LH* associated with it.

In the case of Stockholm Swedish, the specification is sparse, and intonation exploits the presence of toneless syllables. But even when the specification is dense, f0 can simultaneously convey intonational meanings. Various studies (e.g., Liu & Xu, 2005; Xu, 1999) have demonstrated that speakers of Mandarin use f0 range to express utterance-level, that is intonational, meanings such as discourse prominence and sentence type. Karlsson, House, and Svantesson (2012) arrive at a similar conclusion in relation to Kammu, an Austroasiatic language spoken in Laos, Thailand, Vietnam, and China. This language lends itself well to addressing this question, because it comprises both dialects with and without tone. As it turns out, the tonal dialect of Kammu uses “the same mechanisms for focusing and phrasing as non-tonal languages” (Karlsson et al., 2012, p. 43). At the same time, tone does impact the extent to which intonation is used: whereas in the non-tonal dialect, words in focus in spontaneous discourse are on average four semitones higher than when they recur further in the same monologue; in the tonal dialect, the difference is only 1.6 semitone. This shows how lexical tone constrains intonational use of f0 range.

Another way in which tone languages with a dense specification can express intonation is by crowding the intonational tone targets on syllables that already carry a lexical specification. Illustration (15) shows two examples of this from Shilluk, where yes-no questions involve the addition of a Low edge tone. Sentences (15, a & b) show statements in which the utterance-final syllable carries the Rising tone, a morphological specification marking both centrifugal spatial deixis and 2nd singular. Sentences (15, c & d) are the corresponding yes/no questions. Here, a Low target has been added at the end. This low target represents an intonational boundary tone (L%). But whereas in Stockholm Swedish, L% marks declarative utterances, in Shilluk this boundary tone is found at the end of yes/no-questions.












‘You went to pluck the guinea fowl’.

‘Did you go to pluck the guinea fowl?’











‘You went to cut the tree’.

‘Did you go to cut the tree?’

Audio. Recording of (15.a): /àcwʌ̌ʌt̪ á-lʊ̌ʊʊɲ/.

Audio. Recording of (15.b): /jāat̪ á-ŋɔ̌l/.

Audio. Recording of (15.c): /àcwʌ̌ʌt̪ á-lʊ̌ʊʊɲ̀/.

Audio. Recording of (15.d): /jāat̪ á-ŋɔ̌l̀/.

Note how, with the addition of the L%, the utterance-final syllable in (15, c & d) has a total of three tone targets associated with it—the Low and High targets that are constituents of the Rise contour tone, plus the Low edge tone. How is this possible? It is a cross-linguistic universal that the final syllables within phrases and utterances have greater duration; that is, they offer more sonorous duration relatively to other syllables, just as syllables that are stressed or have a long vowel do (cf. Section 4, “Factors Affecting the Distribution of Tones”). This helps to explain why some tone languages allow for an intonational edge target to be added to the specification of the syllable, to the right of the lexical specification. At the same time, the realization of tone targets is considerably compressed.

Intonational edge and prominence tones also interact with the association of lexical tones in the Dutch dialect of Venlo. Strikingly, the lexical tone *only* appears on syllables that are intonationally marked. Gussenhoven and van der Vliet (1999, p. 130) summarize the system as follows: “Representationally, the word accent opposition in the Dutch dialect of Venlo is to be understood as the contrast between the presence of a [High tone] on the second sonorant mora of a stressed syllable (Accent II) versus its absence (Accent I). In order to be included in the surface representation, the lexical tone must occur in either (or both) of two conditions: (a) when focal [prominence tone] occurs on the first mora of the same syllable, and/or (b) when the lexical tone occurs on an [intonational phrase] final mora. In other words, the contrast is neutralized in nonfinal, nonfocused contexts.” So rather than intonation being limited by tone, the nature of the conditionality is the reverse: tone is restricted here to environments that involve intonational specification.

In conclusion, tone languages can accommodate intonational contrast in many ways. It is obvious that one does not preclude the other, to the effect that tone languages cannot be opposed to “intonational” languages. As a working hypothesis in relation, it is safe to assume that a tone language presents some intonational phenomena.

7. The Phonetic Realization of Tone

The primary acoustic correlate of tone is the fundamental frequency (f0) of the voiced part of the speech signal. F0 is determined by the rate of vibration of the vocal folds and is the main factor determining the perceived pitch or melody. However, there often are other phonetic correlates involved, probably in particular when the inventory is rich. This is illustrated by Kuang’s (2013) study of Black Miao (Miao-Yao, China). Black Miao is remarkable in that it has five distinct tone levels: Low, Lower Mid, Mid, Higher Mid, and High. Kuang investigates this system both from the side of speech production and from the perceptual angle. She finds that voice quality, or non-modal phonation, plays a role in the phonetic realization of two of the tones: the Low tone is creaky (vocal fry), and the Mid is breathy. Taking into account the role of voice quality explains why, in the perception study, the Lower Mid and Higher Mid tones are the most confusable, rather than either of these in relation to the Mid, to which they are actually more similar in terms of f0. On this basis, she hypothesizes that a five-way contrast between level tones requires the involvement of additional correlates. DiCanio (2009b) reports the opposite scenario for Chong. Here, f0 appears in a supporting role, as a secondary correlate distinguishing tense phonation from three other voice qualities. These examples show it is the relative importance of melody and voice quality as acoustic correlates and perceptual cues that determine whether a contrast is to be interpreted as one of tone or rather voice quality. In some languages, it may not be obvious whether the contrast is best interpreted as a tone contrast or a voice quality contrast. This is illustrated by the acoustic study of Tamang by Mazaudon and Michaud (2008).

Apart from voice quality, another correlate that can be involved as a secondary correlate is the voice onset time (VOT) of a plosive in the onset of the syllable that carries the tone. VOT determines whether a consonant is perceived as voiced or voiceless. Pearce (2009) demonstrates how, in Kera, an Afroasiatic language spoken in Chad, the VOT value of the syllable is a secondary perceptual correlate of the tonal contrast. The relative importance of these correlates is the reverse of the situation in English, where f0 is a secondary correlate of the voicing contrast in stops, which has voice onset time as its primary correlate (Hombert, 1978). We can conclude that f0 is the defining and primary correlate of tonal contrasts, but not the exclusive correlate.

8. The Transcription of Tone

Three different conventions to transcribe tone patterns are in use: diacritics, Chao tone letters, and through numbers. These are illustrated in Table 2, using as an example the inventory of tone patterns of Black Miao (Kuang, 2013), the level tones of which are familiar from Section 3, “The Typology of Tone Systems.” The diacritic convention is included in the International Phonetic Alphabet (IPA); it is the convention used throughout this article. As seen from the example in Table 2, if the inventory of distinctive patterns is rich, then it may be necessary to combine diacritics. This is further illustrated by the falling tone patterns of Shilluk in Table 1. There are four falling tone patterns in Shilluk, so that the standard diacritic for a fall (cv̂c) is insufficient. I use it to represent the Low Fall. The other falling contours are the High Fall (cv̂́c), the High Fall to Mid (cv̂́c̄), and the Late Fall (cv́c̀). Such combinations of diacritics will only be used when the system of contrasts requires us to. The second convention to represent tone patterns are Chao tone “letters” (Chao, 1930). This convention is also part of the IPA. In this approach, the speaker’s range is represented by a vertical line, and the tone pattern by a line shape within this range.

Table 2. Illustration of Different Systems of Transcription, on the Basis of the Inventory of Contrastive Tone Patterns in Black Miao (Kuang, 2013)

conventions black miao tones

diacritics (ipa)

chao (ipa)

numbers (one is low)

extra high













extra low




high rise




low rise






Note: The rightmost column shows Kuang’s transcriptions.

The third convention uses numbers. It is used most in the study of Asian tone languages, among others by Kuang (2013). Here the speaker’s range is represented by the scale from 1 to 5, on the hypothesis that five is the maximum number of level tones that a language can distinguish (cf. Section 3, “The Typology of Tone Systems”). Level tones are represented by two instances of the same number; falling and rising tone patterns are represented by increasing and decreasing numbers, respectively. Where required, a third number can be included to denote a more complex shape. In another variant of the numeric convention, the correspondence between the range from 1 to 5 and the speaker’s range is reversed, so that 1 corresponds to the top of the speaker’s range. This is the approach used in Pike (1948), and it is widespread in the study of languages of the Americas.

Note how both Chao tone letters and the numeric approach spell out tone patterns in greater phonetic detail than the diacritic convention. This is illustrated by the Fall of Black Miao. In the absence of a contrast with another falling contour, the diacritic / ̂/ would be used in the diacritic convention, that is, the general symbol for a falling contour. In contrast, the representations in Chao tone letters and in numbers require a more detailed specification as to where in the speaker’s tone range this fall is produced.

Links to Digital Materials

  • The International Phonetic Alphabet.

  • Manual of Articulatory Phonetics by W. A. Smalley. The exercises from chapter 2 of Smalley 1963 course offer the most extensive set of audio training materials currently available. The audio files contain both stimuli and correct answers, as explained in the usage notes. Digitized by P. Mertens and M. Van de Velde.

  • Xtone: Cross-Linguistic Tonal Database. Edited by D. Allison, L. M. Hyman, and D. Mortensen. A crowd-sourced resource, offering detailed information on the tone systems of about 65 languages. Xtone is an effective tool to find instances of particular tonal phenomena.

  • Maddieson, I. Tone. In The World Atlas of Language Structures Online. Edited by M. S. Dryer and Martin Haspelmath. Munich: Max Planck Digital Library. Chapter 13, a basic classification of 527 languages into those without tone, simple systems, and complex systems.


I thank the linguists who contributed audio data: Olle Engstrand (Swedish), Jianjing Kuang (Black Miao), Inga McKendry (Mixtec), and Amos Teo (Sumi). I am grateful to them both for sharing their data, and also for taking the time to answer my questions. I also thank Cédric Patin and Oliver Stegen, for answering questions on Saghala and Rangi, respectively, and Otto Gwado Ayoker, for making some of the Shilluk recordings.

Further Reading

Fromkin, V. (Ed.) (1978). Tone: A linguistic survey. New York: Academic Press.Find this resource:

Hyman, L. M. (2006). Word-prosodic typology. Phonology, 23, 225–257.Find this resource:

Hyman, L. M. (2014). How to study a tone language. Language Documentation and Conservation, 8, 525562.Find this resource:

Karlsson, A., House, D., & Svantesson, J.-O. (2012). Intonation adapts to lexical tone: The case of Kammu. Phonetica, 69, 28–47.Find this resource:

Kuang, J. (2013). The tonal space of contrastive five level tones. Phonetica, 70, 1–23.Find this resource:

Odden, D. (1995). Tone: African languages. In J. A. Goldsmith, J. Riggles, & A. C. L. Yu (Eds.), The handbook of phonological theory (pp. 444–475). Chichester, U.K.: Blackwell.Find this resource:

Remijsen, B., & Ayoker, O. G. (2014). Contrastive tonal alignment in falling contours in Shilluk. Phonology, 31(3), 435–462.Find this resource:

Riad, T. (2006). Scandinavian accent typology. Sprachtypologie und Universalienforschung, 59(1), 36–55.Find this resource:

Snider, K. (2014). On establishing underlying tone contrast. In L. M. Hyman (Ed.), How to study a tone language? Language Documentation and Conservation, 8, 707–737.Find this resource:

Williams, E. S. (1976). Underlying tone in Margi and Igbo. Linguistic Inquiry, 7, 463–484.Find this resource:

Xu, Y. (1999). Effects of tone and focus on the formation and alignment of f0 contours. Journal of Phonetics, 27, 55–105.Find this resource:

Xu, Y., & Sun, X. (2002). Maximum speed of pitch change and how it may relate to speech. Journal of the Acoustical Society of America, 111, 1399–1413Find this resource:

Zhang, J. (2002). The effects of duration and sonority on contour tone distribution: Typological survey and formal analysis. New York: Routledge.Find this resource:


Bateman, J. (1990). Iau segmental and tone phonology. Nusa, 32, 29–42.Find this resource:

Chao, Y.-R. (1930). A system of tone-letters. Le Maître Phonétique, 45, 24–27.Find this resource:

Chen, M. Y. (2000). Tone sandhi: Patterns across Chinese dialects. Cambridge, U.K.: Cambridge University Press.Find this resource:

Clements, G. N., & Ford, K. C. (1979). Kikuyu tone shift and its synchronic consequences. Linguistic Inquiry, 10, 179–210.Find this resource:

Cruz, E. (2011). Phonology, tone, and the functions of tone in San Juan Quiahije Chatino. (Doctoral dissertation). University of Texas at Austin.Find this resource:

DiCanio, C. T. (2009a). Itunyoso Trique. Journal of the International Phonetic Association, 40, 227–238.Find this resource:

DiCanio, C. T. (2009b). The phonetics of register in Takhian Thong Chong. Journal of the International Phonetic Association, 39, 162–188.Find this resource:

DiCanio, C., Amith, J. D., & Castillo García, R. (2014). The phonetics of moraic alignment in Yoloxóchitl Mixtec. In C. Gussenhoven, Y. Chen, & D. Dediu (Eds.), Proceedings of the 4th International Sympoium on Tonal Aspects of Languages (Nijmegen, May 13–16, 2014). ISCA archive accessed on this resource:

Edmondson, J. A., & Gregerson, K. J. (1992). On five-level tone systems. In S. J. J. Hwang & W. R. Merrifield (Eds.), Language in context: Essays for Robert E. Longacre (pp. 555–576). Arlington: Summer Institute of Linguistics and University of Texas at Arlington.Find this resource:

Engstrand, O. (1995). Phonetic interpretation of the word accent contrast in Swedish. Phonetica, 52, 171–179.Find this resource:

Engstrand, O. (1997). Phonetic interpretation of the word accent contrast in Swedish: Evidence from spontaneous speech. Phonetica, 54, 61–75.Find this resource:

Fedden, S. (2007). A grammar of Mian, a Papuan language of New Guinea (PhD Diss.). University of Melbourne.Find this resource:

Goldsmith, J. A. (1976). Autosegmental phonology (PhD Diss.). Massachusetts Institute of Technology.Find this resource:

Gussenhoven, C. (2004). The phonology of tone and intonation. Cambridge, U.K.: Cambridge University Press.Find this resource:

Gussenhoven, C., & van der Vliet, P. (1999). The phonology of tone and intonation in the Dutch dialect of Venlo. Journal of Linguistics, 35, 99–135.Find this resource:

Haraguchi, S. (1977). The tone pattern of Japanese: An autosegmental theory of tonology. Tokyo: Kaitakusha.Find this resource:

‘t Hart, J., Collier, R., & Cohen, A. (1990). A perceptual study of intonation: An experimental-phonetic approach to speech melody. Cambridge, U.K.: Cambridge University Press.Find this resource:

Hombert, J.-M. (1978). Consonant types, vowel quality and tone. In V. Fromkin (Ed.), Tone: A linguistic survey (pp. 77–111). New York: Academic Press.Find this resource:

House, D. (2004). Pitch and alignment in the perception of tone and intonation. In G. Fant, H. Fujisaki, J. Cao, & Y. Xu (Eds.), From traditional phonology to modern speech processing: Festschrift for Professor Wu Zhongji’s 95th birthday (pp. 189–204). Beijing: Foreign Language Teaching and Research Press.Find this resource:

Hualde, J. I., Elordieta, G., Gaminde, I., & Smiljanić, R. (2002). From pitch-accent to stress-accent in Basque. In C. Gussenhoven & Natasha Warner (Eds.), Laboratory phonology 7 (pp. 547–584). New York: Mouton de Gruyter.Find this resource:

Hyman, L. M. (2006) Word-prosodic typology. Phonology, 23, 225–257.Find this resource:

Hyman, L. M. (2013). Issues in the phonology-morphology interface in African languages. In Ọ.-Ọ. Orie & K. W. Sanders (Eds.), Selected proceedings of the 43rd annual conference on African linguistics (pp. 16–25). Somerville, MA: Cascadilla Proceedings Project.Find this resource:

Hyman, L. M. (2014). How to study a tone language. Language Documentation and Conservation, 8, 525–562.Find this resource:

Hyman, L. M., & Kobepa, N. (2013). On the analysis of tone in Mee (Ekari, Ekaugi, Kapauku). Oceanic Linguistics, 52, 307–317.Find this resource:

Karlsson, A., House, D., & Svantesson, J.-O. (2012). Intonation adapts to lexical tone: The case of Kammu. Phonetica, 69, 28–47.Find this resource:

Kenstowicz, M. (1994). Phonology in generative grammar. Cambridge, MA: Blackwell.Find this resource:

Kuang, J. (2013). The tonal space of contrastive five level tones. Phonetica, 70, 1–23.Find this resource:

Leben, W. R. (1973). Suprasegmental phonology. MIT PhD Diss.Find this resource:

Liu, F., & Xu, Y. (2005). Parallel encoding of focus and interrogative meaning in Mandarin intonation. Phonetica, 62, 70–87.Find this resource:

Mazaudon, M., & Michaud, A. (2008). Tonal contrasts and initial consonants: A case study of Tamang, a “missing link” in tonogenesis. Phonetica, 65, 231–256.Find this resource:

McCawley, J. D. (1978). What is a tone language? In V. Fromkin (Ed.), Tone: A linguistic survey (pp. 113–131). New York: Academic Press.Find this resource:

McKendry, I. (2013). Tonal association, prominence, and prosodic structure in Southeastern Nochitlán Mixtec (PhD Diss.). University of Edinburgh.Find this resource:

Odden, D. (1995). Tone: African languages. In J. A. Goldsmith (Ed.), The handbook of phonological theory (pp. 444–475). New York: Blackwell.Find this resource:

Patin, C. (2009). Tone shift and tone spread in the Saghala noun phrase. Faits de Langue—Les Cahiers, 1, 230–247.Find this resource:

Pearce, M. (2009). Kera tone and voicing interaction. Lingua, 119, 846–864.Find this resource:

Pierrehumbert, J. (1980). The phonology and phonetics of English intonation (PhD Diss.). Massachusetts Institute of Technology.Find this resource:

Pike, K. L. (1948). Tone languages: A technique for determining the number and type of pitch contrasts in a language, with studies in phonemic substitution and fusion. Ann Arbor: University of Michigan Press.Find this resource:

Remijsen, B. (2013). Tonal alignment is contrastive in falling contours in Dinka. Language, 89, 297–327.Find this resource:

Remijsen, B., & Ayoker, O. G. (2014). Contrastive tonal alignment in falling contours in Shilluk. Phonology, 31(3), 435–462.Find this resource:

Remijsen, B., Miller-Naudé, C. L., & Gilley, L. G. (2015). Stem-internal and affixal morphology in Shilluk. In M. Baerman (Ed.), The handbook of inflection. Oxford: Oxford University Press.Find this resource:

Riad, T. (1998). Towards a Scandinavian accent typology. In W. Kehrein & R. Wiese (Eds.), Phonology and morphology of the Germanic languages (pp. 77–109). Tübingen: Max Niemeyer Verlag..Find this resource:

Riad, T. (2006). Scandinavian accent typology. Sprachtypologie und Universalienforschung, 59(1), 36–55.Find this resource:

Snider, K. (2014). On establishing underlying tone contrast. Language Documentation and Conservation, 8, 707–737.Find this resource:

Stegen, O. (2002). Derivational processes in Rangi. Studies in African Linguistics, 31, 129–153.Find this resource:

Sullivant, J. R. (2011). Tone alignment in San Juan Quiahije Chatino. Paper presented at The 40th Conference of the Linguistic Association of the Southwest (LASSO XL).Find this resource:

Teo, A. (2014). A phonological and phonetics description of Sumi, a Tibeto-Burman language of Nagaland. Melbourne, Australia: Australian National University: Asia-Pacific Linguistics open access monographs.Find this resource:

van der Hulst, H., & Smith, N. (Eds.). (1988). Autosegmental studies on pitch accent. Dordrecht, Netherlands: Foris.Find this resource:

Wedekind, K. (1983). A six-tone language in Ethiopia: Tonal analysis of Benč4non4 (Gimira). Journal of Ethiopian Studies, 16, 129–156.Find this resource:

Welmers, W. E. (1970). Igbo tonology. Studies in African Linguistics, 1(3), 255–278.Find this resource:

Williams, E. S. (1976). Underlying tone in Margi and Igbo. Linguistic Inquiry, 7, 463–484.Find this resource:

Xu, Y. (1999). Effects of tone and focus on the formation and alignment of f0 contours. Journal of Phonetics, 27, 55–105.Find this resource:

Xu, Y. (2001). Fundamental frequency peak delay in Mandarin. Phonetica, 58, 26–52.Find this resource:

Xu, Y., & Sun, X. (2002). Maximum speed of pitch change and how it may relate to speech. Journal of the Acoustical Society of America, 111, 1399–1413.Find this resource:

Yip, M. (1989). Contour tones. Phonology, 6, 149–174.Find this resource:

Yu, K. M. (2014). The experimental state of mind in elicitation: Illustrations from tonal fieldwork. Language Documentation & Conservation, 8, 738–777.Find this resource:

Zhang, J. (2002). The effects of duration and sonority on contour tone distribution: Typological survey and formal analysis. New York: Routledge.Find this resource:


(1.) Actually weight units or moras; this concept will be introduced in Section 5, “Contextual Tone Processes.”

(2.) I follow Riad’s analysis in transcribing the lexically specified tone pattern as a High tone and will continue to do so further on in this article. Note though, that the phonetic evidence suggests that a representation of this tone pattern as a fall may be more appropriate, given that Engstrand reported that “[t]he only positively specified feature of the Central Standard Swedish word accent contrast is an f0 fall on the primary stress vowel in [accent II] words” (Engstrand, 1997, p. 74). This is illustrated in Figure 3.