The Oxford Research Encyclopedia of Linguistics will be available via subscription on April 26. Visit About to learn more, meet the editorial board, or recommend to your librarian.

Show Summary Details

Page of

PRINTED FROM the OXFORD RESEARCH ENCYCLOPEDIA, LINGUISTICS ( (c) Oxford University Press USA, 2016. All Rights Reserved. Personal use only; commercial use is strictly prohibited. Please see applicable Privacy Policy and Legal Notice (for details see Privacy Policy).

date: 22 March 2018

Variation in Phonology

Summary and Keywords

Language is a system that maps meanings to forms, but the mapping is not always one-to-one. Variation means that one meaning corresponds to multiple forms, for example faster ~ more fast. The choice is not uniquely determined by the rules of the language, but is made by the individual at the time of performance (speaking, writing). Such choices abound in human language. They are usually not just a matter of free will, but involve preferences that depend on the context, including the phonological context. Phonological variation is a situation where the choice among expressions is phonologically conditioned, sometimes statistically, sometimes categorically. In this overview, we take a look at three studies of variable vowel harmony in three languages (Finnish, Hungarian, and Tommo So) formulated in three frameworks (Partial Order Optimality Theory, Stochastic Optimality Theory, and Maximum Entropy Grammar). For example, both Finnish and Hungarian have Backness Harmony: vowels must be all [+back] or all [−back] within a single word, with the exception of neutral vowels that are compatible with either. Surprisingly, some stems allow both [+back] and [−back] suffixes in free variation, for example, analyysi-na ~ analyysi-nä ‘analysis-ess’ (Finnish) and arzén-nak ~ arzén-nek ‘arsenic-dat’ (Hungarian). Several questions arise. Is the variation random or in some way systematic? Where is the variation possible? Is it limited to specific lexical items? Is the choice predictable to some extent? Are the observed statistical patterns dictated by universal constraints or learned from the ambient data? The analyses illustrate the usefulness of recent advances in the technological infrastructure of linguistics, in particular the constantly improving computational tools.

Keywords: free variation, lexically conditioned variation, gradience, Partial Order Optimality Theory, Stochastic Optimality Theory, Harmonic Grammar, Maximum Entropy Grammar, T-order, vowel harmony, Finnish, Hungarian, Tommo So, learnability, morphophonology, implicational universals

1. The Landscape of Phonological Variation

Variation is a central puzzle in linguistics.1 Language is a system that maps meanings to forms, but the mapping is not always one-to-one. One meaning may correspond to multiple forms (variation) and one form may correspond to multiple meanings (ambiguity). In both cases, the speaker/hearer must make a choice at the time of performance (speaking, understanding).2


Variation in Phonology

Variation is closely related to gradience: when multiple variants (forms, meanings) are available it is common for one or some of them to be preferred over the others. Such preferences may be influenced by the nonlinguistic context, but more often than not grammatical factors also play a role. That is why variation and gradience matter to grammatical analysis.

The present overview is an attempt to outline some theoretical options involved in the analysis of phonological variation and to illustrate their consequences based on detailed empirical analyses. The discussion is organized as follows: Section 1 draws preliminary distinctions. Section 2 reviews three studies of variation in vowel harmony: Ringen and Heinämäki (1999; Finnish), Hayes and Londe (2006; Hungarian), and McPherson and Hayes (2016; Tommo So). This allows us to examine phonological variation in three languages and analyses in three different frameworks while keeping the empirical domain fairly similar across the examples. Section 3 addresses the relationship between free and lexically conditioned variation. Section 4 points out the existence of universal quantitative biases in variation and how such biases emerge from the models. Section 5 reviews two topics of current interest: globality versus locality in variation and the suggestion that variation in word order may in fact be phonological. Section 6 concludes the article.

Phonological variation involves free choice among linguistic forms that is in some way phonologically conditioned. Consider the following example from Finnish: two adjacent vowels /V1V2/ may join to become one syllable nucleus, that is, a diphthong, or they may split to become two separate syllable nuclei (Häkkinen, 1978; F. Karlsson, 1982, pp. 89, 137; Anttila & Shapiro, 2017). The two alternative syllabifications can appear in free variation, as illustrated in (2) by the vowel sequence /au/:


Variation in Phonology

This example illustrates two points. First, variation occurs in a narrow environment where phonology fails to dictate a unique outcome: variation occurs only if the second vowel is [+high, +round] (= /u, y/) and neither vowel receives primary stress. All other environments are invariant: under primary stress the result is a diphthong, as in háu.ta ‘grave’; changing the second vowel to [−round] (= /i/) yields a diphthong, as in ái.kai.nen ‘early’; changing the second vowel to [−high] (= /o/) splits the vowel sequence into two syllables, as in káa.ka.o ‘cocoa’. In these environments there is no variation. It is the job of phonological analysis to explain why variation arises where it does. Predicting the loci of variation is an important first step in the study of phonological variation, although it is often skipped by researchers whose main interest is in the variation itself.

Second, while variation involves free choice among linguistic forms, the choice is hardly ever completely free. In the Finnish example, there is a known preference: rák.kau.den favors a diphthong, láu.ka.ùs.ta favors splitting. The generalization is that a diphthong is preferred if the vowels are immediately followed by a syllable boundary, whereas splitting is preferred if the vowels are followed by a coda consonant (Häkkinen, 1978). It is again the job of phonological analysis to explain this peculiar pattern. The key to the explanation lies in secondary stress, which falls (to a first approximation) on odd-numbered syllables, unless the syllable is final. Split syllabification provides an opportunity for secondary stress to apply: in rák.ka.ù.den secondary stress falls on a light (open) syllable, whereas in láu.ka.ùs.ta it falls on a heavy (closed) syllable. Since stressed lights are universally worse than stressed heavies (see, e.g., Prince, 1990) one can understand why split syllabification sounds worse in rák.ka.ù.den than in láu.ka.ùs.ta. Predicting not only where variation occurs, but also the degrees of relative well-formedness among the variants is another key task for the phonological study of variation.

Variation may occur both across and within speakers. In the first case, speakers differ, but each speaker may be individually invariant. This means that variation in a corpus may in fact result from pooling data across several invariant speakers. In the Finnish syllabification example, the variation is probably speaker-internal. For example, the poet V. A. Koskenniemi used all four types of syllabifications in Elegioja, a collection of verse published in 1917. Verse is a useful source of data because syllabification is easy to detect against the backdrop of meter. In a study of optional schwa deletion in French, Bayles, Kaplan, and Kaplan (2016) found that individual speakers vary and that there is also variation between speakers in the frequency of schwa’s appearance.

Variation may be either subsymbolic or symbolic. Subsymbolic variation involves a single structure realized in a scalar or gradient fashion. This kind of variation is familiar to phoneticians and those sociolinguists who self-identify as sociophoneticians. An example is the absolute duration of aspiration in English voiceless stops. Gradient variability is often used to express paralinguistic meanings: raising of voice can be used to signal anger or surprise; raising the voice a lot can signal violent anger or great surprise (Ladd, 1996, pp. 36–41). From the perspective of generative grammar, this kind of variation is part of the performance overlay that accompanies all language behavior and is an interesting object of study in itself. Drawing the distinction between competence and performance can be difficult in particular cases; for an engaging discussion, see Pierrehumbert (1994).

Symbolic variation involves a probabilistic choice among discrete variants. To continue with another Finnish example, consider vowel coalescence in colloquial Finnish, for example, /makea/ → má.ke.a ~ má.kee ‘sweet’ (Kiparsky, 1993a; Paunonen, 1995; A. Anttila, 2009). This variable process resolves a hiatus between adjacent unstressed vowels of which the second is low by deleting the second vowel and lengthening the first. In a word with two potential variation sites we get four variants: /usea-mpA-i-tA/ ‘many-comp-pl-par’ → ú.se.àm.pi.a ~ ú.seem.pì.a ~ ú.se.àm.pii ~ ú.seem.pii. Phonetically the variation is gradient as boundaries between vowels are never quite sharp and adjacent vowels often smear into each other. Phonologically, the variation is categorical in a sense obvious to native speakers. There are exactly four variants, not three or five. Thus, one can sensibly ask: “Did you say useampia or useempia?” The choice is binary: speakers know that the variation manipulates categorical phonological entities, such as the phonemic difference between /a/ and /e/. The variants also differ in syllable count: ú.se.àm.pi.a has five syllables, ú.seem.pì.a has four. Suggesting that careful phonetic measurements might reveal that the variation is in fact continuous and involves variants with, say, 4.2 or 4.7 syllables would be plainly absurd. What is phonetically continuous can be phonologically discrete, and here variation takes place between discrete categories. In practice, it may not always be easy to draw a distinction between symbolic and subsymbolic variation; see, for example, Ernestus (2011) and Ernestus and Baayen (2011); for the general problem of the phonology/phonetics boundary, see Ladd (2011); for a unified approach to symbolic and subsymbolic phenomena, see Flemming (2001).

The picture is further complicated by the fact that phonology is not a monolithic system. Phonology is often taken to consist of lexical and postlexical components which may contain overlapping processes; see, for example, Kiparsky (1982, 1985, 2015) and Mohanan (1986). For example, English nasal place assimilation is invariant in the lexical phonology (within words), as in a[m]ber/*a[n]ber, but variable in the postlexical phonology (across words), as in gree[n] boat ~ gree[m] boat, with differences in the details of the processes; see, for example, Kiparsky (1985, p. 86); Coetzee and Pater (2011, pp. 404–405); and Coetzee (2016). A similar example is English palatalization, for example, confe[s], confe[ʃ]ion (invariant, symbolic, lexical) versus I mi[s] you ~ I mi[ʃ] you (variable, subsymbolic, postlexical); see, for example, Zsiga (1995). The symbolic vs. subsymbolic distinction is related to the lexical versus postlexical distinction, but does not coincide with it. This modular architecture suggests that observed variation may be a composite of several kinds of variation distributed across modules, with the results folded together in speech. A showcase example is the variable t/d deletion in English, for example, it cost ~ cos’ me five dollars, which has been analyzed as the result of variation at two lexical levels and one postlexical level; see, for example, Guy (1991, 2011); Kiparsky (1993b); Myers (1995); Coetzee (2004); and Bermúdez-Otero (2010).

A standard diagnostic for lexical processes is that they may have morphological or lexical conditions. Finnish vowel coalescence is clearly a lexical process: it applies to native nouns like hopea ~ hopee ‘silver’, lipeä ~ lipee ‘lye’, and häpeä ~ häpee ‘shame’, but not to recent borrowings like idea/*idee ‘idea’ or oseanografia/*oseenografia ‘oceanography’. It is also more likely to apply across morpheme boundaries than within roots. For example, /ia/ variably coalesces across morpheme boundaries (/lasi-tA/ → lasia ~ lasii ‘glass-par’), but never within roots (/rasia/ → rasia/*rasii ‘box’). Here the freedom of choice characteristic of a variable process is sharply curtailed by morphological and lexical factors, clearly showing that variation is a matter of grammar instead of mere performance. A special type of lexical condition is lemma frequency, which seems to be correlated with all kinds of processes: symbolic and subsymbolic, lexical and postlexical. We cannot hope to do justice to the large literature on frequency effects here, but see, for example, Bybee (2001, 2002); Jurafsky, Bell, Gregory, and Raymond (2001); Pierrehumbert (2001); Gahl (2008); Frisch (2011); Coetzee and Kawahara (2013); and references there.

How has phonological theory responded to free variation? In broad strokes, current approaches can be grouped into two types based on what they assume about the speaker’s internalized grammar. One view holds that variation is essentially noise. Every speaker has a single grammar, often expressed in terms of numerically weighted constraints. Every time the person speaks, they add a little noise to each of these numbers, jiggling the weights a bit at random, hence variation. This is the view taken in Stochastic Optimality Theory (Boersma, 1997; Boersma & Hayes, 2001) and Noisy Harmonic Grammar (Jesney, 2007; Coetzee & Pater, 2011; Coetzee & Kawahara, 2013; Boersma & Pater, 2016; Hayes, 2017). An alternative view holds that variation is structural. For example, in Partial Order Optimality Theory (Kiparsky, 1993b; Reynolds, 1994; Nagy & Reynolds, 1997; A. Anttila, 1997; Anttila & Cho, 1998; Zamma, 2012; Djalali, 2017) a single speaker controls multiple grammars defined in terms of partially ranked constraints. The choice among the grammars may be random or conditioned by factors outside phonology, such as morphology, the lexicon, or sociolinguistics. Other structural models include Coetzee’s (2004, 2006) Rank-Ordering Model, where the constraints are totally ranked, but the theory predicts multiple outputs rank-ordered in terms of their well-formedness, and Kaplan’s (2011) Markedness Suppression Theory where free variation is derived from the variable suppression of markedness violations.

This taxonomy is clearly imperfect. The two approaches are not nearly as different as they may initially seem, making translation between them often possible. However, they represent two powerful guiding intuitions. When systematically developed, they lead to very different analyses and ultimately theories about human language. We will illustrate this by concrete examples in the subsequent sections.

2. Variation in Vowel Harmony

This section introduces three different approaches to phonological variation, focusing on variation in vowel harmony. Basic familiarity with Optimality Theory (OT, Prince & Smolensky, 1993/2004) will be assumed; for a concise introduction, see McCarthy (2007); for an overview of current approaches to vowel harmony, see Walker (2012).

2.1 Finnish

Finnish has eight vowel phonemes: the back vowels /a, o, u/, the front vowels /ä, ö, y/, and the phonetically front, but phonologically neutral vowels /i, e/. In native simplex words, front and back vowels do not co-occur, but neutral vowels are compatible with both. Thus, kylmä ‘cold’ and kulma ‘corner’ both exist, but *kylma and *kulmä sound unacceptable because they mix front and back vowels. In contrast, pesu ‘washing’ and kesy ‘tame’ are both fine because neutral vowels are compatible with both back and front vowels. Vowel harmony is not only a static phonotactic constraint on roots, but it triggers phonological alternations in suffix vowels, forcing them to share the harmony value of the root.

Vocabulary borrowed from languages with no vowel harmony, notably Swedish, is not always adjusted to the demands of vowel harmony, especially not in the educated standard language. This results in the existence of disharmonic roots. Thus, analyysi ‘analysis’, hieroglyfi ‘hieroglyph’, marttyyri ‘martyr’, and afääri ‘affair’ all sound unremarkable. What happens to suffixes after such disharmonic roots? In an experimental study, Ringen and Heinämäki (1999), henceforth R&H, found variable vowel harmony statistically conditioned by stress, vowel length, and vowel sonority. This is significant because such phonological conditions are not visible in the invariant vowel harmony data, underlining the importance of variation for phonological theory. For a replication and extension to nonce words, see Kimper and Ylitalo (2012).

R&H’s key claims are the following. With native roots, suffixes show no variation. Disharmonic roots trigger two kinds of behavior: if the last harmonic vowel is back, the suffix vowel is back, for example, /syntaksi-nA/ → syntaksina/*syntaksinä ‘syntax-ess’; if the last harmonic vowel is front we get variation, for example, /afääri-nA/ → afäärina ~ afäärinä ‘affair-ess’. The variation is not completely free, but exhibits preferences that depend on the length of the root. For example, the three-syllable root /marttyyri/ ‘martyr’ allows both front and back about fifty-fifty, that is, marttyyrina ~ marttyyrinä ‘martyr-ess’, whereas the four-syllable root /miljonääri/ ‘millionaire’ is almost always front, that is, miljonäärinä ~ ??miljonäärina ‘millionaire-ess.’ Why should the length of the root measured in syllables matter to preferences in vowel harmony? We will return to this question shortly.

R&H’s analysis is cast within Partial Order OT. They start by assuming that the constraints in (3) are undominated and not violated in Finnish.3


Variation in Phonology

The first constraint gives special protection to harmonic root vowels; for precedents, see, for example, McCarthy and Prince (1995) and Beckman (1997). The second rules out the back counterparts of the neutral vowels /i, e/ that do not exist in Finnish. The fact that neutral vowels are precisely those vowels whose back counterparts are missing from the inventory and where backness is thus not distinctive has been pointed out many times; see, for example, Kiparsky (1985, p. 115) and Kiparsky and Pajusalu (2003) for theoretical interpretation.

Following the usual practice in autosegmental phonology, R&H assume that the features [+back] and [−back] are linked to the vowels by association lines. The constraints responsible for spreading harmony left to right are shown in (4).


Variation in Phonology

From the variation perspective, the most interesting constraints are those in (5). They encode R&H’s key observation: the choice of suffix harmony is sensitive to the harmony values of the stressed vowels as well as the harmony value of the most sonorous vowel of the root. Note that R&H’s sonority hierarchy conflates length, height, and rounding.


Variation in Phonology

The following tableau presents two illustrative examples: /syntaksi-nA/ ‘syntax-ess’ where we only get the back suffix and /analyysi-nA/ ‘analysis-ess’ where we get variation. To save space, the two undominated constraints in Figure 3 and the candidates that violate them have been suppressed. We arbitrarily assume that the suffix vowel is underlyingly [−back]. The choice does not matter since there are no faithfulness constraints in the tableau.


Variation in Phonology

The constraint No-Int+b strives to align [+back] with the right word edge. This constraint dominates the remaining four constraints Prim, Sec, Son, and NoIntb, which are mutually unranked and responsible for the variation. At the time of performance, the speaker randomly picks a total ranking from among the 4! = 24 possible total rankings of the four constraints (with replacement). No variation is predicted in sýntaksìna because all the 24 total rankings favor the back variant. In contrast, variation is predicted in ánalỳysina ~ ánalỳysinä: Prim and Son favor back harmony, whereas Sec and NoInt−b favor front harmony. Depending on which constraint ranks highest in the selection, we get either back or front harmony. This is the source of variation across speech events. In this particular case, the 24 total rankings are evenly divided between back (ánalỳysina) and front (ánalỳysinä), with each variant winning by 12 total rankings, implying that the chances of selecting back versus front are even. This is not always the case, as we will see shortly.4

R&H further subscribe to the hypothesis that the number of total rankings predicting a candidate, also called the candidate’s ranking volume (Riggle, 2010), is proportional to its relative well-formedness and hence its likelihood of being used by the speaker. Sample quantitative predictions for four disharmonic roots are shown in (7). These predictions were computed with OTOrder (Djalali & Jeffers, 2015), a web application for working with Partial Order OT.


Variation in Phonology

This is in the right direction, but there is room for improvement. R&H arrive at a remarkably good fit by taking into account an old observation that goes back to at least Sadeniemi (1949, p. 48): loanwords are sometimes treated as pseudo-compounds, for example, hiero=glyfi. In this case, the prediction is that under the pseudo-compound analysis the suffix vowel should be invariably front, no matter what ranking is used, because glyfi only has front vowels. In contrast, afääri cannot be analyzed as a pseudo-compound *a=fääri because a compound must consist of phonological words and a phonological word must contain at least two vocalic moras, that is, a moraic trochee. The string a is too short because it only has one vocalic mora. This means that the compound analysis is possible for hieroglyfi, but not for afääri. This explains the observation that word length in syllables matters to preferences in vowel harmony.

In addition to variation within an individual, there may be variation across individuals (see, e.g., Välimaa-Blum, 1999). From R&H’s perspective, this means that individuals may differ in their rankings. For example, a speaker might consider primary stress a more important determinant of vowel harmony than secondary stress. This would be empirically reflected as a higher proportion of back variants in words where primary stress falls on a back vowel. Adding the pairwise ranking Prim >> Sec into the grammar would adjust the quantitative predictions in the desired way: the predicted proportions would change in the two stems where primary stress falls on /a/: á.na.lỳä 25% ~ á.na.lỳ 75% and á.fää.rì.nä 50% ~ á.fää.rì.na 50%. Another speaker might insist on front harmony for all four roots and thus only allow á.na.lỳä, híỳ.fi.nä, á.fää.rì.nä, and míä̀ä.ri.nä. Such a speaker would require at least two additional pairwise rankings, for example, NoInt−b >> Son and NoInt−b >> Prim (see (8)).


Variation in Phonology

These grammars are literally more complex than the baseline grammar with no rankings among the four constraints because they contain additional ranking information. The totally ordered tableau of classical OT is the most complex case of all: every constraint is ranked with respect to every other constraint. However, there are limits to what adding rankings can accomplish. While front harmony across the board can be obtained by adding rankings, back harmony across the board is out of reach: there is no ranking that would generate the hypothetical dialect á.na.lỳ, híỳ, á.fää.rì.na, and míä̀ä, with only back vowel suffixes. This is not only because míä̀ä is harmonically bounded: it turns out that there is no ranking that simultaneously predicts invariant back harmony in both híỳ and á.fää.rì.na, even though both are individually predicted to be possible. Where does such a strange restriction come from? More generally, what kinds of dialects and quantitative patterns are possible given a set of constraints? We will return to these questions shortly.

R&H’s analysis is insightful, but leaves some puzzles unsolved. One is the interaction of vowel harmony with morphology and foot structure. In their discussion of Balto-Finnic vowel harmony typology, Kiparsky and Pajusalu (2003, p. 224) summarize the syndrome as follows: “The overarching generalization here is that harmonic constraints may be stricter in derived environments than morpheme-internally, and stricter in non-initial feet.” In Finnish, a disyllabic root with a neutral vowel in the first syllable allows a front/back contrast in the second syllable, for example, sinä ‘you’ versus kina ‘squabble’, riittä- ‘suffice’ versus viitta ‘cloak’. This is correctly predicted by R&H’s high-ranked root faithfulness constraint. Unfortunately, the same ranking also predicts that vowel harmony should be freely violated within roots. One can fairly object that R&H’s analysis was never intended to capture root harmony, but since root and suffix harmony are clearly related, the analysis should be able to say something sensible about both.

The analysis also falls short of predicting that the front/back contrast is neutralized outside the initial (= main stress) foot, even in roots, where only front vowels occur: (kípi), ‘spark’, (lípe)ä ‘lye’, and (éte) ‘south’ exist, but *(kipi)na, *(lipe)a, and *(ete)la sound unacceptable (F. Karlsson, 1982, p. 103). Kiparsky and Pajusalu (2003, p. 223) propose that there exists a constraint Ident-F1(Back) that gives special protection to the initial foot. Apparent exceptions include recent loans like ídea where the last vowel is [+back]. Recall that there is another phonological process that is also blocked in this word: vowel coalescence does not apply in idea/*idee. Together these facts suggest that recently borrowed nouns differ from native nouns in being exhaustively footed and that vowel coalescence only applies to extrametrical vowels, an analysis that generalizes to other alternations as well; see A. Anttila (2006, pp. 912–915) and Pater (2009a, pp. 141–142). It seems that old Finno-Ugric nouns have disyllabic feet, possibly followed by a final extrametrical syllable, for example, (lípe)ä ‘lye’, hence vowel coalescence (lipe)ä ~ (lipe)e and only front harmony, whereas recently borrowed nouns are exhaustively footed, for example, (ídea), hence no coalescence (idea)/*(idee) and the possibility of back harmony. This is an instance of lexical stratification (see, e.g., Itô & Mester, 1995, 1999): different layers of vocabulary have different metrical structures.

The fact that even some suffixes allow both front and back variants in the initial foot after neutral vowels conclusively shows that R&H’s analysis must be augmented by foot structure constraints. The front/back contrast sometimes comes out as free variation, as in (míeh-uus) ~ (míeh-yys) versus (míehe)-ys/*(miehe)-us ‘manliness’, compare with (naise)-us ‘womanliness’; (píen-uus) ~ (píen-yys) versus (píene)-ys/*(piene)-us ‘smallness’, compare with (valke)-us ‘whiteness’. Following Zuraw (2000, 2010), we could assume that the variable suffix has two listed forms, [+back] and [−back], and that Ident-F1(Back) protects both, yielding the variation (míe.huus) ~ (míe.hyys). With longer stems, the suffix must be front as in (míe.he)ys, because the putative back vowel suffix *(míe.he)us gets no foot protection. This analysis crucially assumes that /i, e/ are specified as [−back] and are not underspecified on the surface.

The same analysis applies to lexical exceptions like víit-tä ‘five-par’ and vét-tä ‘water-par’ (front harmony, the general pattern) versus mér-ta ‘sea-par’ and vér-ta ‘blood-par’ (back harmony, the exceptional pattern), where the front/back choice depends on the lexical identity of the root, a problem we will soon encounter in Hungarian. This suggests that entire words can be lexically listed with the appropriate vowel harmony variant. The fact that these exceptional words are disyllabic is no accident: free variation and lexical conditioning share the property that they are permitted within the main stress foot where phonology fails to dictate a unique outcome. In other words, lexical conditioning emerges in phonologically predictable environments. The same is true of symbolic free variation.

2.2 Hungarian

We now turn to variation in Hungarian front/back vowel harmony studied by Hayes and Londe (2006), henceforth H&L. Hungarian has the following vowel inventory: the front rounded vowels /y, y:, ø, ø:/, which H&L label “F”, the back vowels /u, u:, o, o:, ɔ, a:/ labelled “B”, and the phonetically front, but phonologically neutral unrounded vowels /i, i:, ɛ, e:/, labelled “N”. Vowels within a phonological word usually agree in backness, but in a number of stems front and back vowels co-occur, much as in Finnish. Siptár and Törkenczy (2000, pp. 63–74) provide an overview. For the role of consonants in Hungarian vowel harmony, see Hayes, Siptár, Zuraw, and Londe (2009).

H&L illustrate front/back harmony using the dative suffix, for example, falnak [fɔl-nɔk] ‘wall-dat’ versus kertnek [kɛrt-nɛk] ‘garden-dat’. Harmony is invariant in several environments. For example, if the last stem vowel is either F or B it uniquely determines the suffix harmony, even in disharmonic stems which are usually recent loans: glükóznak [glyko:z-nɔk] ‘glucose-dat’ (FB) versus sofőrnek [ʃofø:r-nɛk] ‘chauffeur-dat’ (BF). Similarly, if a front harmonic vowel precedes a string of one or more neutral vowels at the end of the stem, for example, FN and FNN, the result is invariant front harmony: fűszernek [fy:sɛr-nɛk] ‘spice-dat’ (FN) versus őrizetnek [ø:rizɛt-nɛk] ‘custody-dat’ (FNN).

Variation appears under specific phonological conditions. Stems with all N vowels show lexically conditioned variation: most take front suffixes, for example, címnek [tsi:m-nɛk] ‘address-dat’ (N) and repesznek [rɛpɛs-nɛk] ‘splinter-dat’ (NN), but a few dozen mostly monosyllabic stems take back suffixes: hídnak [hi:d-nɔk] ‘bridge-dat’ (N) and deréknak [dɛre:k-nɔk] ‘waist-dat’ (NN). Siptár and Törkenczy (2000, p. 68) put the number of such stems at 60.5 Both free and lexically conditioned variation are found in stems where the last harmonic vowel is back, followed by some number of neutral vowels, for example, BN and BNN. Some such stems take invariant back suffixes, for example, havernak [hɔvɛr-nɔk] ‘pal-dat’; other stems take invariant front suffixes, for example, mutagénnek [mutɔge:n-nɛk] ‘mutagen-dat’, and yet others vacillate: arzénnek [ɔrze:n-nɛk] ~ arzénnak [ɔrze:n-nɔk] ‘arsenic-dat’. It is this last phonological class of stems, with both lexical and free variation, that is the focus of H&L’s study; see also Ringen and Kontra (1989).

H&L’s main finding is that the choice of front/back suffix with BN or BNN stems is phonologically conditioned, not categorically, but statistically. H&L find evidence for two generalizations. First, the low vowel [ɛ] occurs with front suffixes more often (i.e., in proportionately more stems) than the mid vowel [e:], which occurs with front suffixes more often than the high vowels [i] and [i:]. H&L call this generalization the height effect. Second, the number of neutral vowels matters: BNN stems take front suffixes more often than do BN stems. H&L call this generalization the count effect. Finally, the two effects add up: when both the height and count effects are maximally present there is no variation. Thus, all BNɛ stems take front suffixes (Siptár & Törkenczy, 2000, p. 71). H&L establish these empirical generalizations by three different methods: Internet search, elicitation based on actual words, and elicitation based on nonce words (wug test).

H&L posit 10 constraints. Markedness constraints record violations of agreement in [±back] between vowel pairs: local agreement cares about vowels that are adjacent in the sense of not being separated by other vowels; distal agreement considers all vowel pairs within a word. The following four agreement constraints are posited in (9).


Variation in Phonology

Additional agreement constraints for individual front vowels are required to account for the height effect: Local[i], Local[e:], and Local[ɛ]. The following constraint for NN sequences is needed to account for the count effect in (10).


Variation in Phonology

Finally, H&L posit a special root faithfulness constraint. Recall that the same constraint was adopted in R&H’s analysis of Finnish.


Variation in Phonology

H&L rank these constraints in two steps. First, the space of variation is defined by high-ranking constraints that rule out illegal forms. A version of the Constraint Demotion Algorithm described in Hayes (2004) and Prince and Tesar (2004) yields the five strata shown in (12).


Variation in Phonology

Ranking Ident-IO[back]root highest correctly predicts that suffixes alternate, not roots. It also allows disharmonic roots (BF, FB, BN, NB). The problem is that Hungarian does exhibit root harmony, but that goes unexplained, as H&L note. Recall that the same problem appeared in R&H’s analysis of Finnish. Another key ranking puts the local agreement constraints above the distal ones. That accounts for the generalization that the crucial harmonic vowel is the last root vowel, for example, glükóznak [glyko:z-nɔk] ‘glucose-dat’ (FB), sofőrnek [ʃofø:r-nɛk] ‘chauffeur-dat’ (BF). These rankings are needed to rule out variants that are plainly unacceptable, setting the limits to what can potentially vary.

The key interactions are illustrated in Table 3. Four types of examples are included: a FB stem with invariant back harmony; a BN stem with invariant lexically specified back harmony; a BN stem with lexically specified variation; and a BN nonce word with stochastic variation.


Variation in Phonology

The root ‘glucose’ does not allow variation because its last vowel is harmonic and the constraint Local[B] categorically selects the back variant. The roots ‘steel’ and ‘arsenic’ are different in that they end in a neutral vowel. Recall that such stems fall into three types: some take invariant back suffixes, ‘steel’ being one of them; other stems take invariant front suffixes (no examples given here); yet other stems vacillate, including ‘arsenic’. Following Zuraw (2000), H&L assume that inflected forms are fully listed and protected by Ident-IO[back]. Thus, the dative of ‘steel’ is simply listed as /ɔtse:l-nɔk/ ‘steel-dat’, with a back suffix, and any attempt to change that will result in a violation of Ident-IO[back]. The vacillator ‘arsenic’ is listed with both front and back variants, /ɔrze:n-nɛk/ and /ɔrze:n-nɔk/ ‘arsenic-dat’, and the faithful candidate wins in both cases. Finally, nonce forms have no lexical entries, which makes them invisible to faithfulness. The selection of the winner falls upon the stochastically ranked markedness constraints. The tableau only shows two such constraints: the ranking Distal[B] >> Local[e:] yields the back variant [ha:de:l-nɔk]; the reverse ranking Local[e:] >> Distal[B] yields the front variant [ha:de:l-nɛk]. Stochastic ranking is indicated by a dashed line between these two constraints. Note that the analysis restricts both lexically conditioned variation (in real words) and free variation (in nonce words) to specific phonological environments by high-ranked constraints that dominate both Ident-IO[back] and the stochastically ranked markedness constraints. In this way, the analysis expresses the fundamental intuition that lexically conditioned variation and free variation emerge in the same restricted phonological environment.

The remaining question is how to rank the bottom five constraints that are responsible for the quantitative patterns in nonce words. H&L adopt Stochastic Optimality Theory (Boersma & Hayes, 2001), which differs from classical OT in that constraints have real-number ranking values: constraints are not just ranked C1 >> C2 >> C3, but they may have values like C1 = 105, C2 = 100.1, C3 = 99.9, where C1 and C2 are relatively far apart, but C2 and C3 are quite close to each other. Stochastic OT assumes that an individual has a single grammar, that is, a set of constraints with fixed ranking values, and that variation arises at the moment of speaking when normally distributed noise is added to or subtracted from each constraint’s ranking value. At the moment of speaking, the speaker draws a sample from each constraint’s normal distribution and obtains its actual ranking value, also called its selection point. A common outcome might be 104.5, 100.0 and 99.7, which results in the ranking C1 >> C2 >> C3, but we might also get 104.5, 99.8, and 100.0, which yields the ranking C1 >> C3 >> C2 where C2 and C3 are reversed. Since the outcome varies from one evaluation to the next, the result is free variation. Constraints whose ranking values lie close to each other are more likely to be reversed than constraints that lie far apart, assuming that the noise added to each constraint has the same standard deviation. The high-ranking C1 may be virtually categorical and never violated in the language, with variation limited to cases where violations of C2 can be traded off against violations of C3. Stochastic OT is thus a paradigm example of a theory where variation is interpreted as noise superimposed on a single invariant grammar.

From the very beginning, Stochastic OT was equipped with an associated learning algorithm, the Gradual Learning Algorithm (GLA); see, for example, Boersma (1997) and Boersma and Hayes (2001). The GLA is designed to find ranking values that match the frequencies observed in the training data, given constraints, candidates annotated with their empirical frequencies, and violation marks; for criticism and extensions, see Pater (2008) and Magri (2012). In their analysis of Hungarian, H&L fine-tuned the lowest stratum of five constraints by training the GLA on their Internet corpus, using OTSoft (Hayes, Tesar, & Zuraw, 2013). The ranking values Local[ɛ] = 105.176, Local[NN] = 103.313, Distal[B] = 101.430, Local[e:] = 98.315, and Local[i] = 95.076, found by the GLA, predict frequencies that closely match the observed frequencies in H&L’s nonce word experiment (wug test). H&L have therefore demonstrated, not only that there exists a Stochastic OT grammar that can generate the wug test data, but also that such a grammar can be algorithmically learned, suggesting that it is reasonable to assume that Hungarian speakers, too, can extract statistical phonological regularities from actual Hungarian data (the Internet corpus) and apply this knowledge to made-up words they have never heard before (the wug test).

H&L’s analysis of Hungarian differs from R&H’s analysis of Finnish in several ways. One key difference is that H&L analyzes nonce words and real words completely differently. Variation and statistical preferences in nonce word vowel harmony arise from stochastically ranked phonological constraints. In contrast, variation in real words like arzénnek [ɔrze:n-nɛk] ~ arzénnak [ɔrze:n-nɔk] ‘arsenic-dat’ is a matter of choosing between two distinct underlying forms. This raises the question of how statistical preferences in real words such as arzénnek ~ arzénnak ‘arsenic-dat’ should be analyzed. Such preferences no doubt exist as H&L’s Google survey and elicitation experiment convincingly show. One possibility suggested by Bruce Hayes (p.c.) is that dual lexical entries could be marked for their frequency. This is very different from R&H’s analysis of Finnish where statistical patterns in real words were derived from partially ranked markedness constraints.

Why do R&H and H&L’s analyses differ in this way? One reason must be that R&H assume variation in Finnish vowel harmony to be essentially phonological and therefore see no reason to believe that real words and nonce words would behave any differently. This is a sensible null hypothesis and empirically supported by Kimper and Ylitalo’s (2012) replication study. In contrast, H&L start from the observation that Hungarian vowel harmony has lexical conditions. This leads them to split the data into invariant front, invariant back, and vacillator categories, and to propose an analysis of all three in terms of lexical listing. Since nonce words cannot be lexically listed, they must receive a different analysis, and stochastic ranking is the obvious choice. Both analyses have their shortcomings. R&H have no analysis of lexically conditioned variation, which exists in Finnish, as we have seen. H&L have no analysis of statistical patterns in real words, and a fortiori, no explanation for the statistical similarity of real words and nonce words.

The analyses also differ in their grammatical implementation. H&L have a single grammar with fixed numerical ranking values and variation arises from noisy evaluation. R&H have multiple total rankings that may disagree on the winner, and variation arises from random sampling. Calling the first a single-grammar model and the second a multiple-grammars model seems a matter of perspective. As a reviewer notes, one could argue that both have multiple grammars because they produce variation by making more than one total ranking available. Conversely, one could argue that both have a single grammar because R&H’s total orders are a natural class in the sense that they can be defined as a single partial order, that is, a set of pairwise rankings. Both perspectives seem valid. However, since the terms “single grammar” and “multiple grammars” are traditional in the variation literature, we will retain them and employ them in our discussion of morphologically and lexically conditioned variation.

2.3 Tommo So

Our third example is vowel harmony in Tommo So, a Dogon language of Mali, studied by McPherson (2013) and McPherson and Hayes (2016), henceforth McP&H. Tommo So has seven contrastive vowel qualities: the front vowels /i, e, ɛ/, the back vowels /u, o, ɔ/, and /a/ which is neither front nor back. The mid vowels contrast in [±atr]: /e, o/ are [+atr], /ɛ, ɔ/ are [−atr]. There are three vowel harmony processes: Low Harmony, for example, /dàgá-ndɛ́/ ‘be.good-fact’ → [dàgándá] ‘fix’; Backness Harmony, for example, /dú:-ndɛ́/ ‘bottom-fact’ → [dù:ndó] ‘put down’; and ATR Harmony, which applies to mid vowels, for example, /dě:-ndɛ́/ ‘know-fact’ → [dě:ndé] ‘introduce’. All three processes appear to vary freely within an individual based on the observation that the same speaker may say the same word with different harmony on different occasions. In this section, we will briefly review McP&H’s main findings. Their study is useful for our purposes in two ways: it deals with variable processes that are sensitive to morphology and it illustrates the analysis of variation in terms of Maximum Entropy Grammar (Goldwater & Johnson, 2003; Hayes & Wilson, 2008), a probabilistic version of Harmonic Grammar.

McP&H’s key observation is that the application of optional vowel harmony “peters out” with morphological distance, a finding consistent with the Strong Domain Hypothesis (Kiparsky, 1984, pp. 141–143). All three harmony processes apply to the first layer of suffixation immediately next to the root and are gradually turned off as we move away from the root. (14) shows how the three vowel harmonies are interleaved with morphology in Tommo So verbs (see McP&H, p. 145). The suffixes are numbered by morphotactic position. The factitive suffix numbered six sits immediately next to the root and undergoes all three types of vowel harmony. The defocalized perfective suffix numbered one is furthest away from the root and only undergoes back harmony.


Variation in Phonology

This “petering out” effect is both categorical and gradient. For example, Low Harmony applies to reversive suffixes, but not to transitive suffixes. Backness Harmony applies virtually categorically in roots and factitive suffixes, and the application rate steadily decreases as we move away from the root: reversive 91%, transitive 69%, mediopassive 44%, causative 18%, and defocalized perfective 14%, with the remaining morphology (all inflectional) having a Backness Harmony rate of zero. ATR Harmony is special in that it shows a steep drop from 100% to 0% at the mediopassive-causative boundary.

McP&H’s analysis is remarkably simple. The grammar consists of three markedness constraints from the Agree-family and their corresponding faithfulness constraints from the Ident-family. The analysis makes the simplifying assumption that there are only two candidates, that is, [+low] and [−low]; [+atr] and [−atr]; [+back] and [−back].


Variation in Phonology

McP&H embed these constraints in Harmonic Grammar (HG), see, for example, Legendre, Miyata, and Smolensky (1990); Jesney (2007); Pater (2009b); Potts, Pater, Jesney, Bhatt, and Becker (2010). In HG, every constraint has a real-number weight, usually constrained to be nonnegative to keep constraints from rewarding candidates. A candidate’s harmony is a weighted sum of its constraint violations, intuitively a penalty score. Maximum Entropy (maxent) Grammar is a version of HG where harmony values are turned into probabilities, with each candidate within a single input receiving its share: the probability of a candidate is proportional to the exponential of its harmony score; see, for example, Goldwater and Johnson (2003); Hayes and Wilson (2008); Coetzee and Pater (2011). The key notions are illustrated in tableau (16), borrowed from Hayes (2017):


Variation in Phonology

Candidate evaluation proceeds as follows: for each candidate, the violation count of each constraint is multiplied by its weight, and the results are summed. This yields the candidate’s Harmony. The greater the Harmony, the worse the candidate. In a maxent grammar, Harmony is converted into probabilities as follows: eHarmony is e to the minus harmony of the candidate, Z is the sum of eHarmony values across all candidates, and Probability is the candidate’s share of Z. Candidates with a greater Harmony receive lower probability. Maxent is just one variant of HG; for a brief introduction to other currently popular variants of HG the reader should consult Hayes (2017). The fact that a maxent grammar predicts a probability distribution over candidates makes it suitable for dealing with variation. In terms of learning, maxent grammars are attractive because there exist several provably convergent learning algorithms; for convergence and the GLA, see Pater (2008); Magri (2012); and Boersma and Pater (2016).

McP&H analyze the “petering out” effect by supplying each Agree-constraint with a morphological weight, or a scaling factor, which ranges from one to seven, reflecting the seven levels of embedding. The more deeply embedded the constituent, the higher its scaling factor. The maxent calculation multiplies three values: the constraint’s weight, its number of violations, and its scaling factor. McP&H assume that only Agree-constraints are scaled; Ident-constraints have constant violations. The learning algorithm proceeds by fitting the constraint weights to the observed frequencies for each combination of morphological layer and harmony process, minimizing mean absolute error. McP&H report that their six constraints receive the weights Ident[atr] = 85.6. Agree[atr] = 34.8, Ident[low] = 15.2, Ident[back] = 4.0, Agree[low] = 2.8, and Agree[back] = 1.2. The predicted percentages of vowel harmony for the 20 combinations of harmony process and morphological context closely fit the observations, with a mean absolute error rate of 0.012.

A trademark property of HG is cumulative constraint interaction, where two weaker constraints can gang up to defeat a stronger one; for examples, see Pater (2009b). Finnish vowel harmony illustrates a cumulative interaction where the probability of the [±back] value of the suffix depends on the [±back] values of three crucial syllables: the syllable with primary stress, the syllable with secondary stress, and the syllable with the most sonorous vowel. These three factors add up in a cumulative fashion. This kind of “weak ganging-up cumulativity” (Jäger & Rosenbach, 2006, pp. 942–943) is also predicted by Partial Order OT as well as by Stochastic OT; a similar point is made by Linzen, Kasyanenko, and Gouskova (2013). In R&H’s analysis, the probability of a candidate depends on the aggregate strength of its constraint violations, as well as those of the candidate with which it competes, just as in maxent models (McPherson & Hayes, 2016, p. 151). Interestingly, Hayes and Zuraw (2013) suggest that ganging up typically occurs under stochastic conditions. If this is so, then a cumulativity-based argument for HG would have to involve “counting cumulativity” (Jäger & Rosenbach, 2006, p. 938), or better yet, cases that do not involve variation at all, as those could not be reanalyzed in terms of strict ranking.

In terms of our typology of theories, maxent is a single-grammar theory. There is a unique set of weights that determines the probability of each outcome for each input. Variation arises from sampling from this probability distribution at the time of performance. The treatment of morphological conditioning by adding a scaling factor to each constraint is a natural option in HG. It is easy to see that this method is in no way limited to morphology, but lends itself to modeling just about anything that has a systematic effect on variation. Indeed, scaling factors have been used to capture lexical frequency effects (Coetzee & Kawahara, 2013), register differences (Linzen et al., 2013), and effects of speech rate and perceptual priming (Coetzee, 2016).

A potential problem with maxent grammars lies in their typological predictions. In particular, harmonically bounded candidates can win; see Hayes and Zuraw (2013) for discussion. However, McP&H argue that maxent grammars are also typologically restrictive: their system of conflicting scalar markedness constraints and non-scalar faithfulness constraints inherently generates sigmoid probability functions, the shape we see in the Tommo So “petering out” effect, and arguably elsewhere; see Hayes (2017) for similar observations related to other versions of HG. McP&H also note that their scaling factor model is more restrictive than a model where each morpheme is indexed to a dedicated lexically specific constraint. Such a system achieves a perfect match to the Tommo So data, but at the cost of being able to match any quantitative pattern whatsoever. Here McP&H raise an important general question: what sorts of frequency patterns can a constraint system model in principle? Frequency matching is a good thing, but any theory worthy of the name should also exclude something. We will return to this question in Section 4.

3. Lexically Conditioned Variation and “Exceptions”

In free variation, a single grammatical word may be realized differently from one occasion of use to the next. Thus, the Finnish word /enkeli-i-ten/ ‘angel-pl-gen’, that is, ‘of the angels’, may be realized in at least three ways: enkeleiden, enkeleitten, and enkelien (Anttila & Cho, 1998). These variants arise from the optional application of several phonological processes. This is an instance of symbolic phonological variation: not only are the variants distinct phonemically, but they are also distinct orthographically in the standard language.

The genitive plural variation occurs across the board with trisyllabic /i/-final stems like /enkeli/ ‘angel’, but in trisyllabic /a/-final stems the variation is partly morphologically and lexically conditioned. For example, the word /omena-i-ten/ ‘of the apples’ exhibits free variation just like ‘of the angels’: all three variants omenoiden, omenoitten, and omenien are possible. However, this is a lexical exception: typically, /a/-final stems limit the first two variants to nouns and the last variant to adjectives. This is easiest to illustrate with homophonous stems. The stem /kihara/ is either the noun ‘curl’ or the adjective ‘curly’. The genitive plural /kihara-i-ten/ allows all three variants, but the choice depends on part of speech: the noun varies kiharoiden ~ kiharoitten, the adjective is kiharien. Similarly, the stem /korea/ is either the noun ‘Korea’ or the adjective ‘glamorous’, a case of accidental homophony. The genitive plural /korea-i-ten/ allows all three variants, but the choice depends on part of speech: the noun varies Koreoiden ~ Koreoitten, the adjective is koreiden.

This part-of-speech split is the result of a recent analogical change (G. Karlsson, 1978) that has turned free variation into morphophonemic alternation over the past century, partly restoring the iconic principle “one meaning, one form” in the process (R. Anttila, 1989, pp. 100–101). Finnish is currently in the middle of this change: some free variation remains, and perhaps always will, but the part-of-speech generalization is also clearly visible (A. Anttila, 2002). This diachronic change can be visualized in terms of meaning-form diagrams as follows:


Variation in Phonology

As is typical of analogical change, there are lexical exceptions, such as the noun ‘apple’ that continues to allow all three variants for unknown reasons. Another lexical exception is the noun /jumala/ ‘God’, which seems to have travelled against the current, with jumalien as the only possible variant. The variants *jumaloiden and *jumaloitten used to be acceptable, but no longer are in modern standard Finnish (G. Karlsson, 1978). Finally, it is interesting to note that the levelling is only partial: some free variation remains within nouns, for example, kiharoiden ~ kiharoitten. This variation has no known lexical conditions, but the variants differ in that the kiharoiden-type is systematically more frequent than the kiharoitten-type, again for reasons that remain a mystery.

This is one of the many examples that reveal a connection between free variation and lexically conditioned variation. Synchronically, both occur in environments where phonology fails to dictate a unique outcome; diachronically one can fade into the other. These connections should fall out from the correct synchronic theory. Ultimately, one would like to understand why phonological variation is sometimes free, but sometimes lexically and morphologically conditioned. For a different perspective, see, for example, Pater (2009a) and Mahanta (2012).6

There is an enormous literature on morphologically and lexically conditioned phonology, including lexically conditioned variation. Following Inkelas (2014), we distinguish two main approaches: those that subscribe to the single-grammar hypothesis and those that do not. An example of a single-grammar theory is Zuraw’s (2000, 2010) Dual Listing-Generation Theory, illustrated in H&L’s analysis of Hungarian vowel harmony. Zuraw’s theory maintains the single-grammar hypothesis at the cost of adding complexity to underlying forms. Invariant stems have only one listed form, either [+back] or [−back], whereas variable stems have two listed forms, both [+back] and [−back]. A high-ranking faithfulness constraint guarantees that these underlying distinctions are respected. Crucially, nonce forms have no lexical entries and are thus immune to faithfulness. The decision falls upon stochastically ranked “subterranean” markedness constraints, the result being phonology-sensitive free variation.

Another way of maintaining the single grammar hypothesis is to posit constraints that apply to specific (classes of) lexical items; see, for example, Pater (2000, 2009a) and Moore-Cantwell and Pater (2016). Pater’s (2000) analysis of lexical exceptions in English secondary stress is an early example of this analytical technique applied to a complex set of data. For example, in cònd[ὲ‎]nsátion (from condénse), the pretonic syllable has a full vowel, but in ìnf[ə]rmátion (from infórm), the vowel is reduced. Pater’s analysis offers two ways of dealing with such patterns. One can posit a lexically specific markedness constraint that bans stress clash, leading to the absence of secondary stress and consequently a reduced vowel in ìnf[ə]rmátion. In contrast, stress clash is tolerated in cònd[ὲ‎]nsátion, and the full vowel survives on the strength of the secondary stress. Alternatively, one can posit a lexically specific faithfulness constraint that penalizes vowel reduction in cònd[ὲ‎]nsátion, but not in ìnf[ə]rmátion. An exceptionally variable lexical item such as the Finnish stem omena ‘apple’ would seem to require alternative rankings of some lexically specific constraint (Pater, 2009a).

Both the full listing theory and the lexically specific constraint theory maintain the single-grammar assumption at a price. There is only one target ranking per language and the standard learning algorithms apply. However, as a reviewer notes, learners must recognize that there are multiple underlying forms and/or lexically specific constraints, and posit the right ones. For the problem of cloning lexically specific versions of general constraints in the course of learning, see Becker (2009). The main challenge comes from the typological angle. Under the lexically specific constraint theory, one can posit individual markedness and faithfulness constraints for every morpheme in the language and rank them as the facts may require. This is descriptively convenient: McP&H note that affiliating a separate Agree constraint with each affix of Tommo So (along with one for roots) achieves a perfect fit to the vowel harmony data. That is also the problem: a constraint system that can fit any data pattern whatsoever is unsatisfactory because it excludes nothing. One way to make progress is to posit language-specific rankings that limit “exceptions” to certain environments, as H&L do in their study of Hungarian, but that does not solve the general problem. The lexically specific constraint theory is not alone in suffering from this overfitting problem, as we will see shortly.

The alternative to single-grammar theories is to assume that a language may contain coexisting phonological systems. This means that different (classes of) lexical items can be associated with different constraint rankings/weightings, or cophonologies. There is nothing particularly new about this view. For example, word phonology and phrasal phonology are usually treated as distinct modules, an assumption implicit in any analysis that posits word-sized inputs, suggesting that abstracting away from phrase-level effects is legitimate because phrasal phonology is in some sense a separate system (Kiparsky, 2015). Extending this idea below the word level yields the familiar picture of a stratified lexicon; see, for example, Kiparsky (1982, 1985, 2000, 2015); Mohanan (1986); Itô and Mester (1995, 1999); Giegerich (1999). Under this view, the phonological system of a language consists of subgrammars that may differ in constraint ranking/weighting, but may nevertheless constitute a phonologically natural class, such as a partial order; see, for example, A. Anttila (2002); Inkelas and Zoll (2007); and Zamma (2012). The genesis of such layered systems is familiar: sound changes enter phonology as postlexical variable rules and their results survive as lexical rules after the sound change is long gone (Kiparsky, 1988, 1995).

In the domain of free variation, the distinction between a single grammar where variation is treated as noise and multiple grammars where variation is treated as a choice among structural options is sometimes a matter of perspective. For example, a grammar defined as a partial order can be viewed either as a set of pairwise rankings, that is, as a single grammar, or as a set of total rankings compatible with the partial order, that is, as multiple grammars. However, in the domain of lexically conditioned variation the choice between single grammar and multiple grammar theories leads to quite different analyses. Consider again vowel coalescence in Finnish. The process is conditioned by a number of phonological and morphological factors with both categorical and statistical effects. There are three key generalizations. Coalescence is more likely in mid-low sequences than in high-low sequences. A minimal pair is hópea ~ hópee ‘silver’ where coalescence is optional, versus rásia/*rásii ‘box’ where it is categorically blocked. Coalescence is more likely in suffixes than in roots. A minimal pair is lási-a ~ lási-i ‘glass-par’ with optional coalescence, versus rásia/*rásii ‘box-par’ with no coalescence. Finally, coalescence is more likely in adjectives than in nouns. A minimal pair is mákea ~ mákee ‘sweet’, which is an adjective, versus ídea/*ídee ‘idea-par’, which is a noun. All three factors contribute to the likelihood of coalescence in a cumulative manner, with both categorical and statistical outcomes.

A simple analysis is available in terms of four constraints: the anti-hiatus constraints *ea and *ia violated by /ea/ and /ia/, respectively, Faithroot against coalescence in roots, and Faith against coalescence everywhere. The tableau in (18) shows the constraint violations. In (1ab) and (2ab) both vowels are inside the root; in (3ab) and (4ab) they straddle a morpheme boundary.


Variation in Phonology

We can now define various grammatical subsystems of Finnish by adding pairwise rankings as shown in (19). These grammars were found with the help of OTOrder (Djalali & Jeffers, 2015).


Variation in Phonology

Ranking (a) defines the grammar of Colloquial Helsinki Finnish, where vowel coalescence is optionally possible everywhere except in root-internal /ia/-sequences. Ranking (b) is the system of old female upper-class residents of the Töölö neighborhood who allow variable coalescence in mid-low sequences, but never in high-low sequences in Paunonen’s corpus. Ranking (c) is the grammar of newscasters working for Yleisradio (Yle), Finland’s national public broadcasting company, who do not allow coalescence at all.7 Ranking (d) is the grammar of recently borrowed nouns where coalescence is only possible in suffixes, but never in roots, for example, teodikea/*teodikee ‘theodicy’, ukulele-a~ukulele-e ‘ukulele-par’. The four grammars are formally alike: all are partial orders. They differ in function: the first three serve a sociopolitical function in being properties of dialects and styles; the last one serves a grammatical function in being a property of a class of words. Finally, adding the ranking *ea >> *ia into the grammar of Finnish guarantees the statistical property that vowel coalescence is more common in mid-low than in high-low sequences. In this way, the grammar of an individual can be decomposed into binary rankings, where each binary ranking may carry some grammatical or semiotic value. Such binary rankings do not need to add up to a single total order, but the ranking may remain partial and permit some amount of free variation.8

The most obvious advantage of the cophonology theory is its simplicity. There are no lexically specific constraints. The number of constraints remains small and the predicted typologies are easy to calculate. There are two main challenges. First, the learning algorithm’s goal cannot be to find a single consistent ranking that applies to every word in the language. Just as dialects can differ in terms of constraint rankings, so can syntactic categories and possibly individual words. One may also wonder why cophonologies within a single language tend to be rather similar. Nouns are not subject to wholly different rankings from verbs and recently borrowed nouns differ from other nouns in only minor ways. Ultimately, the question is why words should cluster around some particular ranking instead of some other ranking, or indeed, why they should cluster at all instead of being randomly scattered around the space of available grammars. This is the cophonology variant of the overfitting problem we encountered with lexically specific constraints.

However, there is every reason to believe that an explanation of some depth exists. Lexical “exceptions” and morphological conditions are not arbitrary, but subject to broad generalizations, even universal ones. In an interesting article, Smith (2011) observes that cross-linguistically nouns tend to be phonologically more faithful than verbs, with adjectives somewhere in between, and further suggests that such effects may have a prosodic basis. There is also evidence that sublexical phonotactic regularities play a role in allowing learners to group words into lexically specific phonological patterns (Hayes et al., 2009; Szeredi, 2016). The challenge is to develop a theory that not only accommodates patterns of lexically conditioned variation in individual languages, but also explains Smith’s generalizations, and more importantly, serves as a vehicle of discovery in guiding us to new ones.

4. Universal Biases in Variation

Language learners seem to have no problem with variation. Not only are they able to learn which items vary and which do not, but they are also remarkably good at frequency matching. Becker, Ketrez, and Nevins (2011) have argued that learners are helped along by Universal Grammar, and for that reason some phonological generalizations are easier to learn than others. This brings up the interesting possibility that some quantitative patterns are not learned at all, but follow directly from Universal Grammar. In this section, we will illustrate how this happens in R&H’s analysis of Finnish vowel harmony. We will see that the quantitative patterns in R&H’s data come largely for free and that the role of phonological learning is relatively small.

Consider the violation tableaux for the four unranked constraints at the bottom of R&H’s grammar. The inputs are /analyysi/ ‘analysis’, /hieroglyfi/ ‘hieroglyph’, /afääri/ ‘affair’ and /miljonääri/ ‘millionaire’.


Variation in Phonology

This grammar predicts variation for all inputs except the last one where the back variant is harmonically bounded. It further predicts that the variable cases should differ statistically in particular ways, no matter how the constraints are ranked. In order to see this, consider the back harmony candidates á.na.lỳ and híỳ Both are attested, but á.na.lỳ (42%) sounds better than híỳ (13%). How does the analysis predict this? Let us start by considering the rankings that are required in order to make each of these candidates win. We use a comparative tableau (Prince, 2002a, 2002b; McCarthy, 2008, pp. 45–50; Brasoveanu & Prince, 2011) whose purpose it is to help the analyst find a ranking that works. In a comparative tableau, loser rows have their constraints labelled W for ‘favors the winner’ and L for ‘favors the loser’. The constraints are not ranked as it is precisely the ranking that we are trying to figure out. We select back harmony candidates as desired winners and mark them with the pointy finger (☞).


Variation in Phonology

The two loser rows are labeled <W,L,W,L> and <L,L,W,L>. The labels are identical except for the constraint Prim which favors back harmony in /analyysi/, but front harmony in /hieroglyfi/. The elementary ranking condition (Prince, 2002a, 2002b) states that a ranking works only if all loser-favoring constraints (L) are dominated by some winner-favoring constraint (W). In the case of híỳ this means that Son must be ranked above all the other constraints. We now come to the crucial point. As a moment’s reflection will show, this ranking guarantees that á.na.lỳ must also win: if Son ranks above Prim, Sec, and NoIntb (the stronger condition) it must also rank above Sec and NoIntb (the weaker condition). The former predicts híỳ, the latter predicts á.na.lỳ In other words, any ranking that produces híỳ also produces á.na.lỳ This is an implicational universal: implicational because of its form “If X is grammatical, then Y is grammatical,” universal because it holds true no matter how the constraints are ranked.9

Finding one implicational universal raises the possibility that there are others. It is well worth our while to find them all since implicational universals are key predictions of the theory. However, finding implicational universals manually by inspecting tableaux is a tedious exercise. The only practical way is to use a computer. OTOrder has an “Entailments” function that automatically finds implicational universals hidden in a grammar and visualizes them as a directed graph. Anttila and Andrus (2006) call such graphs T-orders, where “T” stands for “typological.” The T-order for the grammar in (20) is shown in (22).


Variation in Phonology

Each node in a T-order is a candidate, that is, an (input, output) pair. In order to save space, we have marked inputs with numbers that correspond to the tableau in (20). The output is either a back or front harmony variant. The arrows are implicational universals. In addition, OTOrder prints out an additional piece of information: the ranking volume (rv) of each candidate. For example, híỳ is produced by six rankings, whereas á.na.lỳ is produced by 12. Given the hypothesis that a candidate’s ranking volume is proportional to its relative well-formedness, híỳ (13%) should sound worse than á.na.lỳ (42%), which is correct. It is important to see that this inequality follows directly from the implicational universal: since every ranking that produces híỳ also produces á.na.lỳ, it is necessarily the case that the ranking volume of the former is at most as large as the ranking volume of the latter, but no larger. More generally, ranking volumes increase in the direction of the arrows, predicting increasing well-formedness towards the bottom of the graph. All these predictions are borne out in R&H’s data. They are statistical implicational universals: statistical because they are of the form “If X is grammatical, then Y is at least as grammatical,” universal because they hold true no matter how the constraints are ranked. The fact that they are hardwired in the theory has the important consequence that they cannot be subverted by learning.10

Some implicational universals hold across opposite vowel harmony values. For example, every ranking that produces híỳ (back harmony, rv = 6) also produces á.fää.rì.nä (front harmony, rv = 18). This is empirically correct: híỳ (13%) sounds worse than á.fää.rì.nä (78%). There are also candidate pairs that are not related by any implicational universal. For example, the front variants híỳ.fi.nä and á.fää.rì.nä both have the ranking volume of 18, predicting that both should show front harmony at the same rate, under R&H’s theory 75% of the time. However, this prediction crucially depends on the absence of ranking. Adding rankings can change the prediction either way: some rankings will make the ranking volume of híỳ.fi.nä larger than that of á.fää.rì.nä, other rankings will do the opposite. The prediction depends on the ranking and thus ultimately on the data the speaker has encountered while learning Finnish. The theory itself is silent about the relative well-formedness of híỳ.fi.nä and á.fää.rì.nä; this is reflected as the absence of arrows between them. In this case, we can thus expect a data-driven statistical pattern with no particular universal rationale.

The Finnish example illustrates an important general point: selecting a particular set of constraints puts many statistical patterns in principle out of reach. This means that the analyst is well advised to compute the implicational universals as early as possible, as soon as a tentative set of constraints has been posited, because the T-order may reveal that the constraints are not up to the task and will never yield a good fit to the data, no matter how they are ranked or weighted. In this way, T-orders are helpful in identifying incorrect constraints. The Finnish example also underlines the importance of variation data for phonological theory: many different sets of constraints may work equally well if all we consider is invariant patterns, but once we bring in statistical variation data those sets of constraints may be distinguishable.

The fact that a set of constraints automatically yields universal statistical predictions opens up the possibility of studying variation deductively, by first positing a set of constraints, finding their T-order, and then studying whether the statistical predictions entailed by the T-order hold true empirically. This contrasts sharply with the common practice of studying variation inductively, by first observing a variable pattern of interest, collecting data on it, and then attempting to come up with some rationale that accounts for the facts. The great advantage of the deductive approach is that it allows the theory to guide the empirical work, including data collection. A formal theory with deductive consequences has the desirable property that it immediately takes one beyond the data at hand, reveals predictions that the analyst did not anticipate, and if the theory is correct, may pleasantly surprise the analyst by explaining apparently unrelated phenomena that the theory was not set up to explain.

Not all quantitative patterns are universal. Some learning seems to be happening as well. Applying Stochastic OT/GLA to the four Finnish words yields an improved match, especially in the case of /hieroglyfi/. The result in (23) represents the best match over 10 runs with OTSoft default settings (Hayes et al., 2013). The average error per candidate varied between 4.491 and 6.375%; these are numbers reported by OTSoft based on testing the grammar for 2,000 cycles. Maxent learning (Wilson & George, 2008) yields comparable results. Again, recall that the analysis is simplified and ignores pseudo-compounding that is a crucial part of the original R&H analysis. It is possible, although perhaps not likely, that adding a minimal word constraint that formalizes Sadeniemi’s pseudo-compound generalization would result in an even more accurate T-order, to the point of making learning almost unnecessary.


Variation in Phonology

We have illustrated T-orders within Partial Order OT. This is for practical reasons: T-orders are currently easiest to calculate for OT. At least four computer programs have the required functionality: T-Order Generator (Anttila & Andrus, 2006), PyPhon (Riggle, Bane, & Bowman, 2011), OTSoft (Hayes, Tesar, & Zuraw, 2013), and OTOrder (Djalali & Jeffers, 2015). However, the notion of T-order is entirely theory-independent. T-orders are implicitly present in any theory that makes typological predictions, including Stochastic OT, HG, and parametric theories of syntax. For example, there is an implicit T-order hidden in H&L’s Stochastic OT analysis of Hungarian that correctly predicts the proportion of [+back] suffixes to be smaller in BNi-stems than in Bi-stems and smaller in BNe:-stems than in Be:-stems. To repeat, these statistical generalizations do not need to be learned because they are hardwired in the theory, suggesting that R&H and H&L’s constraints were well chosen and that something like them must be part of any analysis of Finnish and Hungarian vowel harmony. Developing algorithms for finding T-orders in other frameworks is an open research problem.

Many unnatural quantitative patterns are ruled out by the T-orders hidden in grammars. This can be seen by manufacturing a perverse dialect of Finnish that reverses the empirical proportions of [+back] and [−back] variants within each stem, keeping everything else the same. With such data, Stochastic OT/GLA fails to converge, with an average error per candidate around 50%. This is because a dialect with such quantitative properties is not only unlearnable by GLA, but entirely outside the typological space of R&H’s analysis. Coetzee and Pater (2011, pp. 422–423) demonstrate a similar point in connection with English t/d-deletion by manufacturing a fictitious dialect “Tejano-prime” that reverses the proportions of t/d-deletion in pre-vocalic and pre-consonantal positions in actual Tejano, and by then attempting to learn this dialect under Stochastic OT, Noisy HG limited to positive weights, and Maximum Entropy Grammar with both positive and negative weights. Only the last system was able to reproduce this unnatural pattern. This suggests that grammatical theories differ in their T-orders and consequently in the laws they impose on quantitative variation. In the general case, the T-order for HG is a subset of (sparser than) the T-order for OT, showing that HG is typologically less restrictive than OT, but the T-orders for classical OT, Partial Order OT, Stochastic OT, and some realistic special cases of HG turn out to be identical (Anttila & Magri, 2017). It is not clear to what extent T-orders survive in maxent grammars.

With maxent models, phonologists have come very close to the logistic regression methodology that has been practiced by sociolinguists for decades (Cedergren & Sankoff, 1974). This convergence is natural given the attention to quantitative data that now characterizes both fields. It is increasingly being recognized that statistical modeling techniques are an indispensable tool for figuring out the generalizations that shape phonological systems. However, as Liberman (1993) points out, these techniques are not theories of phonological variation, but ways of expressing the facts that such a theory should explain: multiple regression analyses do not try to explain the values attributed to factor levels, but simply to estimate them. A theory of language variation must have a more ambitious goal: to derive such quantities from linguistic theory itself instead of having to learn them from the data. As we have seen in this section, given a sufficiently rich theory much of the quantitative structure of variation falls out for free. However, it seems equally clear that we also need a theory of learning as both are likely to play a role in explaining patterns of phonological variation (Albright & Hayes, 2011).

5. Current Topics

5.1 Local Versus Global Variation

Several researchers have pointed out that OT with parallel evaluation only predicts global variation, see, for example, Riggle and Wilson (2005); Vaux (2008); Kaplan (2011); and Kimper (2011). The gist of the argument is this: if an input contains multiple potential variation sites, parallel OT requires that they must be evaluated simultaneously under the same constraint ranking. It follows that all the variation sites should behave in exactly the same way. As an example, consider the optional realization of the French schwa in the phrase envie de te le demander ‘feel like asking you for it’, which potentially contains four schwas: [ãvi də tə lə dəmãde]. It seems that the decision to realize each schwa is independent of the others, barring sonority violations (Côté, 2000). This suggests that the decision is not made once and for all for all four schwas, but we have an example of local variation. For a more detailed discussion of the French facts and references to the extensive literature, see, for example, Côté and Morrison (2007) and Bayles et al. (2016). Among the proposed solutions are Riggle and Wilson’s (2005) Local Constraint Evaluation, Kaplan’s (2011) Markedness Suppression, and Kimper’s (2011) Serial Variation.

Kimper’s solution is to reject parallel OT in favor of Harmonic Serialism, a derivational version of OT where the evaluation is carried out one single harmonically improving step at a time; for an introduction, see McCarthy (2010). In this case, the relevant step would be the deletion or insertion of a single schwa. The gradual evaluation requirement implies that each locus of variation must be evaluated in a separate step. Combined with a theory where variation involves sampling rankings, such as Partial Order OT or Stochastic OT, each step provides an opportunity to select a different ranking. This yields local variation, as desired.

Kaplan (2016) accepts the global variation argument as valid, but points out that it unrealistically simplifies the empirical situation, to the point that the data required for the argument may not actually exist. The argument requires that the sites of variation are identical in the sense that there is no constraint that can distinguish the loci. Kaplan argues that this is not so in at least three key examples: English Flapping, for example, repetitive and marketability, each with two instances of /t/ that potentially undergo flapping; French schwa; and Pima reduplication. For example, in the French string envie de te le demander, the schwas sit in very different morphosyntactic positions, which in turn involve phonological differences that Kaplan analyzes in terms of category-specific alignment.11 Finnish vowel coalescence provides another example: the word /usea-mpA-i-tA/ ‘many-comp-pl-par’ has two potential coalescence sites and we get all four variants: useampia ~ useempia ~ useampii ~ useempii. However, far from predicting identical application in all four contexts, our simple Partial Order OT grammar for Colloquial Helsinki Finnish predicts the frequencies useampia (50%) ~ useempia (25%) ~ useampii (12.5%) ~ useempii (12.5%). The corresponding counts for this particular word in Paunonen’s (1995) aggregate corpus are 22, 5, 3, and 1, respectively. The differences arise because the two environments are different in several ways: the first vowel sequence is /ea/ and both vowels belong to the root; the second is /i-a/ and both vowels belong to suffixes. An actual case of global variation of the sort envisaged by Riggle, Wilson, Vaux, and Kimper would have to involve multiple variation sites that are truly identical phonologically, morphologically, and syntactically in all the relevant respects. Such cases may be difficult to find.

5.2 Phonologically Conditioned Syntactic Variation

Studies of phonological variation typically focus on segmental phonology and prosody; for representative overviews, see Guy (2011) and Coetzee and Pater (2011). Less often discussed is the observation that phonology appears to play a role in the variable distribution of syntactic units. In phonologically conditioned syntactic variation the variables are not sounds or phonemes, but meaningful elements, such as allomorphs/morphemes, words, and phrases, with phonology conditioning both their selection and linearization. The variation is phonological in the sense that it involves free choice among linguistic forms that is conditioned by phonology, either categorically or statistically.

A familiar example of phonologically conditioned variation in allomorph selection is auxiliary contraction in English, for example, is ~ ‘s and will ~ ‘ll, which is sensitive to both segmental and phrasal phonology as already noted by Labov (1969); see also MacKenzie (2011, 2012) and A. Anttila (2017). Other examples include variation in English comparatives, for example, faster ~ more fast vs. vaster ~ more vast, which Adams (2014) analyzes in prosodic terms, and “syllable-counting allomorphy” in Estonian and Finnish where the choice of allomorph depends on foot parsing (Kager, 1996; Anttila & Cho, 1998). A general overview of phonologically conditioned allomorph selection is provided in Nevins (2011).

Going beyond morphology, the variable inclusion/omission of to in English sentences like All I want to do is (to) go to work has also been shown to be sensitive to phonological conditions. Wasow, Levy, Melnick, Zhu, and Juzek (2015) provide the following contrast:


Variation in Phonology

The presence of to is favored if the first syllable of the following verb is stressed, as in (a), and disfavored if it is unstressed, as in (b). Wasow et al. (2015) propose that to serves as a buffer syllable that optimizes prosody by preventing stress clash, assuming that is is stressed since it cannot contract. Somewhat further afield, Shih (2014) provides evidence that the choice of English personal names reflects rhythmic optimization.

Word order is traditionally considered a syntactic topic. In fact, Bennett, Elfner, and McCloskey (2016) note that syntax is sometimes defined as the study of word order. It is therefore of interest that some syntacticians have come to view word order in large part as a matter of “externalization” where phonology plays an active role (Berwick & Chomsky, 2011; Richards, 2016). For example, consider the linearization of “light” elements. An example comes from Swedish where a “light” object may appear either to the right or to the left of the negation word inte; for object shift in other Scandinavian languages, see Thráinsson (2001). Josefsson (2010, p. 6) provides the following example:


Variation in Phonology

Swedish object shift is known to apply only to “definite, light, nonfocused nominals, and in the case of pronouns, only weak pronouns” (Holmberg, 1999, p. 22). The connection between Scandinavian object shift and phonology is argued in Erteschik-Shir (2005). In the case of Swedish, there is some evidence that the number of syllables in the object matters (Josefsson, 2010). Phonological analyses of constituent ordering phenomena include Anttila, Adams, and Speriosu (2010; English dative alternation and Heavy NP Shift); Speyer (2010; English topicalization); Agbayani, Golston, and Ishii (2015; Japanese scrambling); Shih, Grafmiller, Futrell, and Bresnan (2015; English possessives); Agbayani and Golston (2016; Latin hyperbaton); and Bennett et al. (2016; pronoun positioning in Irish). For an overview and references, see Anttila (2016).

6. Phonological Variation in Changing Times

Over the past two decades, variation and gradience have become mainstream topics in phonology. They are no longer considered noise, just something to be controlled for, but data that provide a unique window into the structure of human language. This new perspective has in many ways changed the look and feel of phonology. Variation and gradience are pervasive in language and the ability to handle them is increasingly considered an important measure of success in judging new theoretical proposals in linguistics. Optional rules were always part of the generative phonologist’s toolbox, but in current theorizing variation and quantitative data often take center stage. Rapid developments in the technological infrastructure of linguistics have also helped: variation is now easier to study than ever, thanks to phonologically annotated digital corpora, crowd-sourcing technologies, diverse experimental methods, and the constantly improving computational tools. All this has had the effect of changing the work habits of phonologists and has resulted in new discoveries, new research topics, and a lively debate on what phonology is all about.

Further Reading

Anttila, A. (2016). Phonological effects on syntactic variation. Annual Review of Linguistics, 2(1), 115–137.Find this resource:

Boersma, P., & Hayes, B. (2001). Empirical tests of the Gradual Learning Algorithm. Linguistic Inquiry, 32(1), 45–86.Find this resource:

Coetzee, A. W., & Kawahara, S. (2013). Frequency biases in phonological variation. Natural Language & Linguistic Theory, 31(1), 47–89.Find this resource:

Coetzee, A. W., & Pater, J. (2011). The place of variation in phonological theory. In J. A. Goldsmith, J. Riggle, & A. Yu (Eds.), The handbook of phonological theory (2d ed., pp. 401–434). Chichester, U.K.: Wiley-Blackwell.Find this resource:

Frisch, S. A. (2011). Frequency effects. In M. v. Oostendorp, C. J. Ewen, E. Hume, & K. Rice (Eds.), The Blackwell companion to phonology (Vol. 4, pp. 2, 137–142, 163). Chichester, U.K.: Wiley-Blackwell.Find this resource:

Goldwater, S., & Johnson, M. (2003). Learning OT constraint rankings using a maximum entropy model. In J. Spenader, A. Eriksson, & Ö. Dahl (Eds.), Proceedings of the Stockholm workshop on ‘variation within Optimality Theory’ (pp. 111–120). Stockholm: Stockholm University.Find this resource:

Guy, G. R. (2011). Variability. In M. v. Oostendorp, C. J. Ewen, E. Hume, & K. Rice (Eds.), The Blackwell companion to phonology (Vol. 4, pp. 2,190–192, 213). Chichester, U.K.: Wiley-Blackwell.Find this resource:

Hayes, B., Siptár, P., Zuraw, K., & Londe, Z. (2009). Natural and unnatural constraints in Hungarian vowel harmony. Language, 85(4), 822–863.Find this resource:

Jäger, G., & Rosenbach, A. (2006). The winner takes it all—almost: Cumulativity in grammatical variation. Linguistics, 44(5), 937–971.Find this resource:

Jesney, K. (2007, July). The locus of variation in weighted constraint grammars. Poster session presented at the Workshop on Variation, Gradience and Frequency in Phonology, Stanford University, Palo Alto, CA. Retrieved from this resource:

Kaplan, A. (2016). Local optionality with partial orders. Phonology, 33(2), 285–324.Find this resource:

Kimper, W. A. (2011). Locality and globality in phonological variation. Natural Language & Linguistic Theory, 29(2), 423–465.Find this resource:

Kiparsky, P. (1993b). Variable rules. [Handout from Rutgers Optimality Workshop/NWAVE 1994]. Retrieved from

Labov, W. (1969). Contraction, deletion, and inherent variability of the English copula. Language, 45(4), 715–762.Find this resource:

Linzen, T., Kasyanenko, S., & Gouskova, M. (2013). Lexical and phonological variation in Russian prepositions. Phonology, 30(3), 453–515.Find this resource:

Pierrehumbert, J. B. (2001). Stochastic phonology. Glot International, 5(6), 195–207.Find this resource:

Zuraw, K. R. (2010). A model of lexical variation and the grammar with application to Tagalog nasal substitution. Natural Language & Linguistic Theory, 28(2), 417–472.Find this resource:


Adams, M. E. (2014). The comparative grammaticality of the English comparative (Doctoral dissertation). Stanford University, Palo Alto, CA.Find this resource:

Agbayani, B., Golston, C., & Ishii, T. (2015). Syntactic and prosodic scrambling in Japanese. Natural Language & Linguistic Theory, 33(1), 47–77.Find this resource:

Agbayani, B., & Golston, C. (2016). Phonological constituents and their movement in Latin, Phonology, 33(1), 1–42.Find this resource:

Albright, A., & Hayes, B. (2003). Rules vs. analogy in English past tenses: A computational/experimental study. Cognition, 90(2), 119–161.Find this resource:

Albright, A., & Hayes, B. (2011). Learning and learnability in phonology. In J. A. Goldsmith, J. Riggle, & A. Yu (Eds.), The handbook of phonological theory (2d ed., pp. 661–690). Chicester, U.K.: Wiley-Blackwell.Find this resource:

Anttila, A. (1997). Deriving variation from grammar. In F. Hinskens, R. v. Hout, & L. Wetzels (Eds.), Variation, change and phonological theory (pp. 35–68). Amsterdam, The Netherlands: John Benjamins.Find this resource:

Anttila, A. (2002). Morphologically conditioned phonological alternations. Natural Language and Linguistic Theory, 20(1), 1–42.Find this resource:

Anttila, A. (2006). Variation and opacity. Natural Language & Linguistic Theory, 24(4), 893–944.Find this resource:

Anttila, A. (2009). Derived environment effects in colloquial Helsinki Finnish. In K. Hanson & S. Inkelas (Eds.), The nature of the word: Studies in honor of Paul Kiparsky (pp. 433–460). Cambridge, MA: MIT Press.Find this resource:

Anttila, A. (2016). Phonological effects on syntactic variation. Annual Review of Linguistics, 2(1), 115–137.Find this resource:

Anttila, A. (2017). Stress, phrasing, and auxiliary contraction in English. In V. Gribanova & S. Shih (Eds.), The morphosyntax-phonology connection: Locality and directionality at the interface (pp. 143–170). New York: Oxford University Press.Find this resource:

Anttila, A., Adams, M., & Speriosu, M. (2010). The role of prosody in the English dative alternation. Language and Cognitive Processes, 25, 946–981.Find this resource:

Anttila, A., & Andrus, C. (2006). T-orders. Retrieved from

Anttila, A., & Cho, Y. Y. (1998). Variation and change in Optimality Theory. Lingua, 104(1–2), 31–56.Find this resource:

Anttila, A., & Magri, G. (2017). T-orders across categorical and probabilistic constraint-based phonology. Paper presented at the Annual Meeting on Phonology, New York University, September 15–17, 2017.Find this resource:

Anttila, A., & Shapiro, N. T. (2017). The interaction of stress and syllabification: Serial or parallel? In A. Kaplan, A. Kaplan, M. K. McCarvel, & E. J. Rubin (Eds.), Proceedings of the 34th West Coast Conference on Formal Linguistics (pp. 52–61). Somerville, MA: Cascadilla Proceedings Project.Find this resource:

Anttila, R. (1989). Historical and comparative linguistics. Philadelphia: John Benjamins Publishing.Find this resource:

Bayles, A., Kaplan, A., & Kaplan, A. (2016). Inter- and intra-speaker variation in French schwa. Glossa: A Journal of General Linguistics, 1(1), 19, 1–30.Find this resource:

Becker, M. (2009). Phonological trends in the lexicon: The role of constraints (Doctoral dissertation). University of Massachusetts Amherst, Amherst, MA.Find this resource:

Becker, M., Ketrez, N., & Nevins, A. (2011). The surfeit of the stimulus: Analytic biases filter lexical statistics in Turkish laryngeal alternations. Language, 87(1), 84–125.Find this resource:

Beckman, J. N. (1997). Positional faithfulness, positional neutralisation and Shona vowel harmony. Phonology, 14(1), 1–46.Find this resource:

Bennett, R., Elfner, E., & McCloskey, J. (2016). Lightest to the right: An apparently anomalous displacement in Irish. Linguistic Inquiry, 47(2), 169–234.Find this resource:

Berwick, R. C., & Chomsky, N. (2011). The biolinguistic program: The current state of its development. In A. M. Di Sciullo & C. Boeckx (Eds.), The biolinguistic enterprise: New perspectives on the evolution and nature of the human language faculty (pp. 19–41). New York: Oxford University Press.Find this resource:

Benus, S., & Gafos, A. I. (2007). Articulatory characteristics of Hungarian “transparent” vowels. Journal of Phonetics, 35(3), 271–300.Find this resource:

Bermúdez-Otero, R. (2010). Currently available data on English t/d-deletion fail to refute the classical modular feedforward architecture of phonology. Paper presented at the 18th Manchester Phonology Meeting, Manchester, U.K. Handout retrieved from this resource:

Boersma, P. (1997). How we learn variation, optionality, and probability. In Proceedings of the Institute of Phonetic Sciences of the University of Amsterdam (Vol. 21, pp. 43–58). Amsterdam, The Netherlands: University of Amsterdam.Find this resource:

Boersma, P., & Hayes, B. (2001). Empirical tests of the Gradual Learning Algorithm. Linguistic Inquiry, 32(1), 45–86.Find this resource:

Boersma, P., & Pater, J. (2016). Convergence properties of a gradual learning algorithm for Harmonic Grammar. In J. J. McCarthy & J. Pater (Eds.), Harmonic Grammar and Harmonic Serialism (pp. 389–434). Sheffield, U.K.: Equinox.Find this resource:

Brasoveanu, A., & Prince, A. (2011). Ranking and necessity: the Fusional Reduction Algorithm. Natural Language and Linguistic Theory, 29(1), 3–70.Find this resource:

Bybee, J. L. (2001). Phonology and language use. Cambridge, U.K.: Cambridge University Press.Find this resource:

Bybee, J. L. (2002). Word frequency and context of use in the lexical diffusion of phonetically conditioned sound change. Language Variation and Change, 14(3), 261–290.Find this resource:

Cedergren, H. J., & Sankoff, D. (1974). Variable rules: Performance as a statistical reflection of competence. Language, 50(2), 333–355.Find this resource:

Coetzee, A. W. (2004). What it means to be a loser: Non-optimal candidates in Optimality Theory(Doctoral dissertation). University of Massachusetts Amherst, Amherst, MA.Find this resource:

Coetzee, A. W. (2006). Variation as accessing “non-optimal” candidates. Phonology, 23(3), 337–385.Find this resource:

Coetzee, A. W. (2016). A comprehensive model of phonological variation: Grammatical and non-grammatical factors in variable nasal place assimilation. Phonology, 33(2), 211–246.Find this resource:

Coetzee, A. W., & Pater, J. (2011). The place of variation in phonological theory. In J. A. Goldsmith, J. Riggle, & A. Yu (Eds.), The handbook of phonological theory (2d ed., pp. 401–434). Chichester, U.K.: Wiley-Blackwell.Find this resource:

Coetzee, A. W., & Kawahara, S. (2013). Frequency biases in phonological variation. Natural Language & Linguistic Theory, 31(1), 47–89.Find this resource:

Côté, M. H. (2000). Consonant cluster phonotactics: A perceptual approach (Doctoral dissertation). Massachusetts Institute of Technology, Cambridge, MA.Find this resource:

Côté, M. H., & Morrison, G. S. (2007). The nature of the schwa/zero alternation in French clitics: Experimental and non-experimental evidence. Journal of French Language Studies, 17(2), 159–186.Find this resource:

De Lacy, P. (2004). Markedness conflation in Optimality Theory. Phonology, 21(2), 145–199.Find this resource:

Djalali, A. J. (2017). A constructive solution to the ranking problem in Partial Order Optimality Theory. Journal of Logic, Language and Information, 26(2), 89–108.Find this resource:

Djalali, A. J., & Jeffers, C. (2015). OTOrder [Computer software]. Palo Alto, CA: Stanford University. Retrieved from this resource:

Ernestus, M. (2011). Gradience and categoricality in phonological theory. In M. v. Oostendorp, C. J. Ewen, E. Hume, & K. Rice (Eds.), The Blackwell companion to phonology (Vol. 4, pp. 2, 115–122, 136). Chichester, U.K.: Wiley-Blackwell.Find this resource:

Ernestus, M., & Baayen, R. H. (2011). Corpora and exemplars in phonology. In J. A. Goldsmith, J. Riggle, & A. Yu (Eds.), The handbook of phonological theory (2d ed., pp. 374–400). Chichester, U.K.: Wiley-Blackwell.Find this resource:

Erteschik-Shir, N. (2005). Sound patterns of syntax: Object shift. Theoretical Linguistics, 31(1–2), 47–93.Find this resource:

Flemming, E. (2001). Scalar and categorical phenomena in a unified model of phonetics and phonology. Phonology, 18(1), 7–44.Find this resource:

Frisch, S. A. (2011). Frequency effects. In M. v. Oostendorp, C. J. Ewen, E. Hume, & K. Rice (Eds.), The Blackwell companion to phonology (Vol. 4, pp. 2, 137–142, 163). Chichester, U.K.: Wiley-Blackwell.Find this resource:

Gahl, S. (2008). Time and thyme are not homophones: The effect of lemma frequency on word durations in spontaneous speech. Language, 84(3), 474–496.Find this resource:

Giegerich, H. J. (1999). Lexical strata in English: Morphological causes, phonological effects. Cambridge, U.K.: Cambridge University Press.Find this resource:

Goldwater, S., & Johnson, M. (2003). Learning OT constraint rankings using a maximum entropy model. In J. Spenader, A. Eriksson, & Ö. Dahl (Eds.), Proceedings of the Stockholm workshop on ‘variation within Optimality Theory’ (pp. 111–120). Stockholm: Stockholm University.Find this resource:

Guy, G. R. (1991). Explanation in variable phonology: An exponential model of morphological constraints. Language Variation and Change, 3(1), 1–22.Find this resource:

Guy, G. R. (2011). Variability. In M. v. Oostendorp, C. J. Ewen, E. Hume, & K. Rice (Eds.), The Blackwell companion to phonology (Vol. 4, pp. 2, 190–192, 213). Chichester, U.K.: Wiley-Blackwell.Find this resource:

Hayes, B. (2004). Phonological acquisition in Optimality Theory: The early stages. In R. Kager, J. Pater, & W. Zonneveld (Eds.), Fixing priorities: Constraints in phonological acquisition (pp. 158–203). Cambridge, U.K.: Cambridge University Press.Find this resource:

Hayes, B. (2017). Varieties of noisy Harmonic Grammar. In K. Jesney, C. O’Hara, C. Smith, & R. Walker (Eds.), Proceedings of the Annual Meeting on Phonology 2016 (Vol. 4, pp. 1–17). Washington, DC: Linguistic Society of America. Retrieved from this resource:

Hayes, B., & Londe, Z. C. (2006). Stochastic phonological knowledge: The case of Hungarian vowel harmony. Phonology, 23(1), 59–104.Find this resource:

Hayes, B., & Wilson, C. (2008). A maximum entropy model of phonotactics and phonotactic learning. Linguistic Inquiry, 39(3), 379–440.Find this resource:

Hayes, B., Siptár, P., Zuraw, K., & Londe, Z. (2009). Natural and unnatural constraints in Hungarian vowel harmony. Language, 85(4), 822–863.Find this resource:

Hayes, B., Tesar, B., & Zuraw, K. (2013). OTSoft 2.5 [Computer software]. University of California, Los Angeles, Los Angeles, CA. Retrieved from this resource:

Hayes, B., & Zuraw, K. (2013). Variation in phonology [Class handouts]. Retrieved from

Häkkinen, K. (1978). Suomen yleiskíelen tavuttamisesta [On the syllabification of standard Finnish]. In A. Alhoniemi & J. Kallio (Eds.), Rakenteita: Juhlakirja Osmo Ikolan 60 vuotispäiväksi 6.2.1978. Turku, Suomi: Turun yliopisto.Find this resource:

Holmberg, A. (1999). Remarks on Holmberg’s generalization. Studia Linguistica, 53(1), 1–39.Find this resource:

Inkelas, S. (2014). The interplay of morphology and phonology. Oxford: Oxford University Press.Find this resource:

Inkelas, S., & Zoll, C. (2007). Is grammar dependence real? A comparison between cophonological and indexed constraint approaches to morphologically conditioned phonology. Linguistics, 44(1), 133–171. doi:10.1515/LING.2007.004.Find this resource:

Itô, J., & Mester, A. (1995). The core-periphery structure of the lexicon and constraints on reranking. In J. Beckman, S. Urbanczyk, & L. Walsh (Eds.), University of Massachusetts Occasional Papers in Linguistics (UMOP), Volume 18: Papers in Optimality Theory (pp. 181–209). Amherst, MA: GLSA.Find this resource:

Itô, J., & Mester, A. (1999). The phonological lexicon. In N. Tsujimura (Ed.), The handbook of Japanese linguistics (pp. 62–100). Malden, MA: Blackwell Publishers.Find this resource:

Jäger, G., & Rosenbach, A. (2006). The winner takes it all—almost: Cumulativity in grammatical variation. Linguistics, 44(5), 937–971.Find this resource:

Jesney, K. (2007). The locus of variation in weighted constraint grammars. Poster session presented at the Workshop on Variation, Gradience and Frequency in Phonology, Stanford University, Palo Alto, CA. Retrieved from this resource:

Josefsson, G. (2010). Object Shift and optionality: An intricate interplay between syntax, prosody and information structure. Working Papers in Scandinavian Syntax, 86, 1–24.Find this resource:

Jurafsky, D., Bell, A., Gregory, M., & Raymond, W. D. (2001). Probabilistic relations between words: Evidence from reduction in lexical production. Typological Studies in Language, 45, 229–254.Find this resource:

Kager, R. (1996). On affix allomorphy and syllable counting. In U. Kleinhenz (Ed.), Interfaces in Phonology (pp. 155–171). Berlin: Akademie Verlag.Find this resource:

Kaplan, A. (2011). Variation through markedness suppression. Phonology, 28(3), 331–370.Find this resource:

Kaplan, A. (2016). Local optionality with partial orders. Phonology, 33(2), 285–324.Find this resource:

Karlsson, F. (1982). Suomen kielen äänne-ja muotorakenne. Helsinki: Werner Söderström Osakeyhtiö.Find this resource:

Karlsson, G. (1978). Kolmi-ja useampitavuisten nominivartaloiden loppu-A:n edustuminen monikon i:n edellä. In A. Alhoniemi & J. Kallio (Eds.), Rakenteita: Juhlakirja Osmo Ikolan 60-vuotispäiväksi 6.2.1978 (pp. 86–99). Turku: Turun yliopisto.Find this resource:

Kimper, W. A. (2011). Locality and globality in phonological variation. Natural Language & Linguistic Theory, 29(2), 423–465.Find this resource:

Kimper, W., & Ylitalo, R. (2012). Variability and trigger competition in Finnish disharmonic loanwords. In N. Arnett & R. Bennett (Eds.), Proceedings of the 30th West Coast Conference on Formal Linguistics (WCCFL) (pp. 195–204). Somerville, MA: Cascadilla Proceedings Project.Find this resource:

Kiparsky, P. (1982). Lexical morphology and phonology. In The Linguistic Society of Korea (Ed.), Linguistics in the morning calm: Selected papers from SICOL-1981 (pp. 4–91). Seoul, South Korea: Hanshin Publishing Company.Find this resource:

Kiparsky, P. (1984). On the lexical phonology of Icelandic. In C.-C. Elert, I. Johansson, & E. Strangert (Eds.), Nordic prosody III (pp. 135–163). Umeå, Sweden: University of Umeå.Find this resource:

Kiparsky, P. (1985). Some consequences of lexical phonology. Phonology, 2(1), 85–138.Find this resource:

Kiparsky, P. (1988). Phonological change. In F. Newmeyer (Ed.), Linguistics: The Cambridge Survey, volume I, linguistic theory: Foundations (pp. 363–415). Cambridge, U.K.: Cambridge University Press.Find this resource:

Kiparsky, P. (1993a). Blocking in nonderived environments. In S. Hargus & E. Kaisse (Eds.), Phonetics and phonology, Volume 4: Studies in lexical phonology (pp. 277–313). San Diego, CA: Academic Press.Find this resource:

Kiparsky, P. (1993b). Variable rules. [Handout from Rutgers Optimality Workshop/NWAVE 1994]. Retrieved from

Kiparsky, P. (1995). The phonological basis of sound change. In J. A. Goldsmith (Ed.), The Handbook of Phonological Theory (pp. 640–670). Oxford: Blackwell.Find this resource:

Kiparsky, P. (2000). Opacity and cyclicity. The Linguistic Review, 17(2–4), 351–366.Find this resource:

Kiparsky, P. (2015). Stratal OT: A synopsis and FAQs. In Y. E. Hsiao & L-H. Wee (Eds.), Capturing phonological shades within and across languages (pp. 2–44). Newcastle upon Tyne, U.K.: Cambridge Scholars Publishing.Find this resource:

Kiparsky, P., & Pajusalu, K. (2003). Towards a typology of disharmony. The Linguistic Review, 20(2/4), 217–242.Find this resource:

Labov, W. (1969). Contraction, deletion, and inherent variability of the English copula. Language, 45(4), 715–762.Find this resource:

Ladd, D. R. (1996/2008). Intonational phonology. Cambridge, U.K.: Cambridge University Press.Find this resource:

Ladd, D. R. (2011). Phonetics in phonology. In J. A. Goldsmith, J. Riggle, & A. Yu (Eds.), The handbook of phonological theory (2d ed., pp. 348–373). Chichester, U.K.: Wiley-Blackwell.Find this resource:

Legendre, G., Miyata, Y., & Smolensky, P. (1990). Harmonic Grammar—A formal multi-level connectionist theory of linguistic well-formedness: Theoretical foundations. In Proceedings of the 12th Annual Conference of the Cognitive Science Society (pp. 388–395). Hillsdale, NJ: Lawrence Erlbaum Associates.Find this resource:

Liberman, M. Y. (1993). Optionality and optimality. Fragment of a draft. Department of Linguistics, University of Pennsylvania.Find this resource:

Linzen, T., Kasyanenko, S., & Gouskova, M. (2013). Lexical and phonological variation in Russian prepositions. Phonology, 30(3), 453–515.Find this resource:

MacKenzie, L. (2011). English auxiliary contraction as a two-stage process: Evidence from corpus data. In J. Choi, E. A. Hogue, J. Punske, D. Tat, J. Schertz, & A. Trueman (Eds.), Proceedings of the 29th West Coast Conference on Formal Linguistics (WCCFL) (pp. 152–160). Somerville, MA: Cascadilla Proceedings Project.Find this resource:

MacKenzie, L. (2012). Locating variation above the phonology (Doctoral dissertation). University of Pennsylvania, Philadelphia.Find this resource:

Mahanta, S. (2012). Locality in exceptions and derived environments in vowel harmony. Natural Language & Linguistic Theory, 30(4), 1,109–1,146.Find this resource:

Magri, G. (2012). Convergence of error-driven ranking algorithms. Phonology, 29(2), 213–269.Find this resource:

McCarthy, J. J. (2007). What is Optimality Theory? Language and Linguistics Compass, 1(4), 260–291.Find this resource:

McCarthy, J. J. (2008). Doing Optimality Theory. Malden, MA: Blackwell Publishing.Find this resource:

McCarthy, J. J. (2010). An introduction to harmonic serialism. Language and Linguistics Compass, 4(10), 1,001–1,018.Find this resource:

McCarthy, J. J., & Prince, A. (1995). Faithfulness and reduplicative identity. In J. Beckman, S. Urbanczyk, & L. Walsh (Eds.). University of Massachusetts Occasional Papers in Linguistics (UMOP), Volume 18: Papers in Optimality Theory (pp. 249–384). Amherst, MA: GLSA.Find this resource:

McPherson, L. (2013). A grammar of Tommo So. Berlin, MA: De Gruyter Mouton.Find this resource:

McPherson, L., & Hayes, B. (2016). Relating application frequency to morphological structure: The case of Tommo So vowel harmony. Phonology, 33(1), 125–167. doi:10.1017/S0952675716000051.Find this resource:

Mohanan, K. P. (1986). The theory of lexical phonology. Dordrecht, The Netherlands: Reidel.Find this resource:

Moore-Cantwell, C., & Pater, J. (2016). Gradient exceptionality in Maximum Entropy Grammar with lexically specific constraints. Catalan Journal of Linguistics, 15, 53–66.Find this resource:

Myers, J. (1995). The categorical and gradient phonology of variable t-deletion in English. Paper presented at the International Workshop on Language Variation and Linguistic Theory, Nijmegen, the Netherlands. Retrieved from this resource:

Nagy, N., & Reynolds, B. (1997). Optimality Theory and variable word-final deletion in Faetar. Language Variation and Change, 9(1), 37–55.Find this resource:

Nevins, A. (2011). Phonologically conditioned allomorph selection. In M. v. Oostendorp, C. J. Ewen, E. Hume, & K. Rice (Eds.), The Blackwell companion to phonology (Vol. 4, pp. 2,357–2,382). Chichester, MA: Wiley-Blackwell.Find this resource:

Oostendorp, M. v. (1997). Style levels in conflict resolution. In F. Hinskens, R. v. Hout, & L. Wetzels (Eds.), Variation, change and phonological theory (pp. 207–229). Amsterdam, the Netherlands: John Benjamins.Find this resource:

Pater, J. (2000). Non-uniformity in English secondary stress: The role of ranked and lexically specific constraints. Phonology, 17(2), 237–274.Find this resource:

Pater, J. (2008). Gradual learning and convergence. Linguistic Inquiry, 39(2), 334–345.Find this resource:

Pater, J. (2009a). Morpheme-specific phonology: Constraint indexation and inconsistency resolution. In S. Parker (Ed.), Phonological argumentation: Essays on evidence and motivation (pp. 123–154). London: Equinox.Find this resource:

Pater, J. (2009b). Weighted constraints in generative linguistics. Cognitive Science, 33(6), 999–1,035.Find this resource:

Paunonen, H. (1995). Suomen kieli Helsingissä: Huomioita Helsingin puhekielen historiallisesta taustasta ja nykyvariaatiosta. Helsinki: Helsingin yliopiston suomen kielen laitos.Find this resource:

Pierrehumbert, J. (1994). Knowledge of variation. In K. Beals, J. Denton, R. Knippen, L. Melnar, H. Suzuki, & E. Zeinfeld (Eds.), Papers from 30th Regional Meeting of the Chicago Linguistic Society, Volume 2: The parasession of variation in linguistic theory (pp. 232–256). Chicago: Chicago Linguistic Society.Find this resource:

Pierrehumbert, J. B. (2001). Stochastic phonology. Glot International, 5(6), 195–207.Find this resource:

Potts, C., Pater, J., Jesney, K., Bhatt, R., & Becker, M. (2010). Harmonic Grammar with linear programming: From linear systems to linguistic typology. Phonology, 27(1), 77–117.Find this resource:

Prince, A. S. (1990). Quantitative consequences of rhythmic organization. In M. Ziolkowski, M. Noske, & K. Deaton (Eds.), Papers from the 26th Regional Meeting of the Chicago Linguistic Society, Volume 2: The parasession on the syllable in phonetics & phonology (pp. 355–398). Chicago: Chicago Linguistic Society.Find this resource:

Prince, A., & Smolensky, P. (1993/2004). Optimality Theory: Constraint interaction in generative grammar. Malden, MA: Blackwell.Find this resource:

Prince, A. (2002a). Entailed ranking arguments. Retrieved from

Prince, A. (2002b). Arguing Optimality. Retrieved from

Prince, A., & Tesar, B. (2004). Learning phonotactic distributions. In R. Kager, J. Pater, & W. Zonneveld (Eds.), Constraints in phonological acquisition (pp. 245–291). Cambridge, U.K.: Cambridge University Press.Find this resource:

Reynolds, W. T. (1994). Variation and phonological theory (Doctoral dissertation). University of Pennsylvania, Philadelphia.Find this resource:

Richards, N. (2016). Contiguity theory. Cambridge, MA: MIT Press.Find this resource:

Riggle, J. (2010). Sampling rankings. Retrieved from

Riggle, J., Bane, M., & Bowman, S. (2011). PyPhon [Software package]. Retrieved from

Riggle, J., & Wilson, C. (2005). Local optionality. In L. Bateman & C. Ussery (Eds.), Proceedings of the thirty-fifth annual meeting of the North East Linguistic Society (NELS 35) (pp. 539–550). Amherst, MA: GLSA.Find this resource:

Ringen, C. O., & Kontra, M. (1989). Hungarian neutral vowels. Lingua, 78(2), 181–191.Find this resource:

Ringen, C. O., & Heinämäki, O. (1999). Variation in Finnish vowel harmony: An OT account. Natural Language & Linguistic Theory, 17(2), 303–337.Find this resource:

Sadeniemi, M. (1949). Metriikkamme perusteet. Helsinki: Suomalaisen Kirjallisuuden Seura.Find this resource:

Shih, S. (2014). Towards optimal rhythm (Doctoral dissertation). Stanford University, Palo Alto, CA.Find this resource:

Shih, S., Grafmiller, J., Futrell, R., & Bresnan, J. (2015). Rhythm’s role in genitive construction choice in spoken English. In R. Vogel & R. van de Vijver (Eds.), Rhythm in cognition and grammar: A Germanic perspective (pp. 207–243). Berlin: De Gruyter Mouton.Find this resource:

Siptár, P., & Törkenczy, M. (2000). The phonology of Hungarian. Oxford: Oxford University Press.Find this resource:

Smith, J. L. (2011). Category-specific effects. In M. v. Oostendorp, C. J. Ewen, E. Hume, & K. Rice (Eds.), The Blackwell companion to phonology (Vol. 4, pp. 2, 439–442, 463). Chichester, U.K.: Wiley-Blackwell.Find this resource:

Speyer, A. (2010). Topicalization and stress clash avoidance in the history of English (Vol. 69). New York: Walter de Gruyter.Find this resource:

Szeredi, D. (2016). Exceptionality in vowel harmony (Doctoral dissertation). New York: University, New York.Find this resource:

Thráinsson, H. (2001). Object shift and scrambling. In M. Baltin & C. Collins (Eds.), The handbook of contemporary syntactic theory (pp. 148–202). Malden, MA: Blackwell.Find this resource:

Vaux, B. (2008). Why the phonological component must be serial and rule-based. In B. Vaux & A. Nevins (Eds.), Rules, constraints, and phonological phenomena (pp. 20–60). Oxford: Oxford University Press.Find this resource:

Välimaa-Blum, R. (1999). A feature geometric description of Finnish vowel harmony covering both loans and native words. Lingua, 108(4), 247–268.Find this resource:

Walker, R. (2012). Vowel harmony in Optimality Theory. Language and Linguistics Compass, 6(9), 575–592.Find this resource:

Wasow T., Levy R., Melnick R., Zhu H., & Juzek T. (2015). Processing, prosody, and optional to. In L. Frazier & E. Gibson (Eds.), Explicit and implicit prosody in sentence processing (pp. 133–158). Berlin: Springer.Find this resource:

Wilson, C., & George, B. (2008). Maxent grammar tool [Software package]. Retrieved from

Zamma, H. (2012). Patterns and categories in English suffixation and stress placement: A theoretical and quantitative study (Doctoral dissertation). University of Tsukuba, Tsukuba, Japan. Published in 2013 by Kaitakusha.Find this resource:

Zsiga, E. C. (1995). An acoustic and electropalatographic study of lexical and postlexical palatalization in American English. In B. Connell & A. Arvaniti (Eds.), Phonology and phonetic evidence: Papers in laboratory phonology IV (282–302). Cambridge, U.K.: Cambridge University Press.Find this resource:

Zuraw, K. R. (2000). Patterned exceptions in phonology (Doctoral dissertation). University of California Los Angeles, Los Angeles, CA.Find this resource:

Zuraw, K. R. (2010). A model of lexical variation and the grammar with application to Tagalog nasal substitution. Natural Language & Linguistic Theory, 28(2), 417–472.Find this resource:


(1.) I thank Vivienne Fong, Bruce Hayes, Paul Kiparsky, Giorgio Magri, Joe Pater, and Cathie Ringen for their input and Mohammed Rahman for editorial help. I have benefited considerably from reading Hayes and Zuraw’s (2013) excellent course materials on phonological variation, which is reflected throughout this review. All errors and misinterpretations are mine. The following abbreviations are used: comp = comparative, dat = dative, ess = essive, gen = genitive, par = partitive, pl = plural.

(2.) I owe these diagrams to R. Anttila (1989).

(3.) Our constraint definitions are simplified. The reader should consult R&H for full details.

(4.) The analysis makes a number of unobvious assumptions about underspecification. In the candidate [sýn.tak.sì.na] the neutral vowel /i/ counts as underspecified on the surface for the purposes of spreading: [+back] spreads from [tak] to [na], skipping the intervening /i/ (R&H, p. 318). This is crucial if we want (a) to win. The analysis also assumes that neutral vowels unspecified for backness are interpreted as front (R&H, p. 317). For this reason, we are assuming that for the purposes of Secondary the /i/ counts as front, although that is not crucial here.

(5.) For a radical phonetic reinterpretation of híd-stems, see Benus and Gafos (2007). For an experimental evaluation and an alternative proposal based on sublexical regularities, see Szeredi (2016).

(6.) The notion of “exception” is often unhelpful. Some “exceptions” are in fact small regularities; examples include Hungarian hid-stems (Siptár & Törkenczy, 2000, p. 68) and English “irregular” past tenses cling–clung, fling–flung, sling–slung (Albright & Hayes, 2003). If two patterns are more or less equally robust, it is not clear which is the rule and which is the exception. More importantly, “exceptions” are not linguistic phenomena, on a par with, say, “syllabification” or “vowel harmony.” Exceptions are not in the nature of things. By definition, they are phenomena our current best theory cannot explain. This is evident from the fact that different theories have different exceptions. In this light, it is hard to assign any coherent meaning to the term “theory of exceptions” often used in phonology.

(7.) This can be verified by listening to Yleisradio. The strict ban on coalescence is almost certainly a conscious choice as the newscasters are likely to speak some variable dialect in less formal circumstances. For an early optimality-theoretic proposal on stylistic variation, see van Oostendorp (1997).

(8.) An alternative analysis of the Finnish facts is given by Kaplan (2011) in terms of Markedness Suppression Theory, where free variation is derived from the variable suppression of markedness violations, but alternative rankings are retained for the analysis of lexically conditioned variation. Yet another analysis could be based on the metrical hypothesis that recently borrowed noun stems are exhaustively footed and participate in a different metrical grammar.

(9.) This implicational universal can be derived by Prince’s (2002a, 2002b) rules of inference in two steps: (i) A row entails any other row that can be derived from it by replacing an L with an empty cell (L-retraction); (ii) A row entails any other row that can be derived from it by replacing an empty cell with a W (W-extension). A clear textbook discussion is provided by McCarthy (2008, pp. 124–132).

(10.) An anonymous reviewer suggests that the last statement is true only if we assume that all available rankings have equal probabilities of being selected. This is not so, as can be seen from the following consideration that I owe to Giorgio Magri. T-orders are defined in terms of inclusion relations among sets of rankings (Djalali, 2017). The rankings that map /hieroglyfi-na/ to híỳ (the antecedent mapping) are a subset of the rankings that map /analyysi-na/ to á.na.lỳ (the consequent mapping). Call these sets S1 and S2. Even if some bias favored some specific ranking in S1, it would also favor the same ranking in S2 because all the rankings in S1 are also members of S2. Thus, it does not matter whether all the rankings have the same probability or whether they have different probabilities because S1 is a subset of S2 by the definition of T-order.

(11.) Note that the French example is neither a word nor a sentence, but an NP, suggesting that the global variation argument may be implicitly assuming a cyclic evaluation. However, under a cyclic analysis, the schwas would differ in terms of their cyclic status, with much the same results as under Kaplan’s analysis.