date: 26 May 2017

Languages of the Balkans

1. Defining Balkan Linguistics: Basic Terms and Goals of the Discipline

It is important to note the distinction between “languages of the Balkans” and “Balkan languages” as employed in the field of Balkan linguistics. The former expression can be used for any language that happens to be spoken in the Balkan peninsula (whose northern boundary is problematic but is of no concern here), for example German, Yiddish, Circassian, or Hungarian. The Balkan languages, on the other hand, are those languages that are indigenous to the Balkans in the sense that they took their definitive shape there (as explained in §1.1), and they constitute the first group of languages whose similarities were explained in modern linguistic terms as a result of language contact rather than as a result of descent from a common ancestor. Nikolai Trubetzkoy (1930) coined the term Sprachbund ‘linguistic league’ (as opposed to Sprachfamilie ‘language family’) to describe this relationship. Balkan linguistics, as both a subset of and precursor to contact linguistics, is, at its base, a historical linguistic discipline. It seeks to explain similarities among the relevant languages as the result of diffusion (cf. Labov, 2007) or convergence rather than of either transmission (Labov, 2007), the domain of genealogical linguistics, or putative universal, typological properties of human language (which latter assumes parallel developments whose causation is ahistorical, i.e., unconnected with either contact or ancestry). The relevant languages that constitute the object of Balkan linguistics, with the exception of Turkic, are all part of the Indo-European language family, but they belong to five distinct groups that are known to have been separated for a significant length of time (millennia). Moreover, for four out of five Indo-European groups as well as for Turkic, there exists documentation that goes back more than a millennium, and in some cases several millennia. The Balkan languages are thus the oldest example of a well-documented and still living Sprachbund.

The primary questions that Balkan linguistics seeks to answer are these: what are the results of language contact in the Balkan languages, and how did they come about? The Balkan languages are traditionally defined as Albanian, Modern Greek, Balkan Romance (Romanian, Aromanian, and Meglenoromanian), and Balkan Slavic (Bulgarian, Macedonian, and the southernmost dialects of the former Serbo-Croatian, which dialects are called Torlak). In recent decades, it has been recognized that the relevant dialects of Romani, Judezmo, Turkish, and Gagauz also participate in at least some of the convergent processes that are taken as definitive of the Balkan linguistic league. While the language family as defined by Trubetzkoy is characterized by regular sound correspondences, which in turn help define shared morphology and a core lexicon, the Balkan linguistic league is defined principally by shared morphosyntactic developments and a shared lexicon of borrowings called “cultural” by Trubetzkoy. In the Balkan linguistic league, phonological developments are sometimes shared among different languages at the dialectal level, but there are no such features that characterize the Balkan languages as a group. Just as in the language family not every diagnostic item is represented in every branch, so, too, in the Balkan linguistic league not every feature is equally represented in all languages and dialects.

1.1. A Brief Overview of External History

1.1.1. Albanian and Greek

The speakers of the Indo-European language that would become Albanian and the speakers of the Indo-European language (Hellenic) that would become Modern Greek were the first to enter the Balkan Peninsula whose languages are of relevance for modern Balkan linguistics. Both languages displaced earlier languages, some of which may also have been Indo-European and some of which definitely were not. There are no direct attestations of these languages, and they are therefore of no concern here. In the case of Greek, speakers entered the peninsula some time during the first half of the 2nd millennium bce. From the second half of that millennium, there is Mycenaean Greek written using the so-called Linear B syllabary, but there is a gap in the written record with the collapse of Mycenaean civilization ca. 1200bce until the earliest material using the Phoenician-derived alphabet from the 8th century bce.

In the case of modern Albanian, there is the vexed question concerning which of the non-Hellenic Indo-European languages spoken in the ancient Balkans prior to the Roman annexation (begun in the 2nd century bce) is that language’s ancestor. Albanian scholars have always held that the language ancestral to Albanian was Illyrian, while non-Albanian scholars have been divided among those supporting the Illyrian hypothesis and those that see the Thraco-Moesian complex as ancestral to Albanian. Owing to the poor or utterly lacking attestations of the various relevant ancient languages of the Balkans, a definitive answer remains elusive (see Hamp, 1966; Katičić, 1976; Woodard, 2004). Nonetheless, Eric Hamp, who has been the most careful and conservative of specialists on Albanian dealing with this issue, has decided that the evidence linking Albanian to Messapic, which in turn has been linked to Illyrian, is sufficient to see Albanian as a descendent of a sister-language to Illyrian, if not a descendent of Illyrian itself (reported in Lezo, 2008). The oldest unambiguously Albanian dated document is a baptismal formula from 1462, and the oldest extensive such document is the Missal (Mëshari) of Gjon Buzuku from 1555.

1.1.2. Balkan Romance

As just indicated, the Romans began their annexation of the Balkans in the 2nd century bce, a process which can be said to have reached its high point with the annexation of part of Dacia (in modern Romania) 107–271 ce. While Latin was the language of administration in the Roman Empire, Greek continued to be the language of inscriptions in the southern Balkans and retained significant prestige throughout the Empire. The scholars Konstantin Jireček and Petar Skok identified boundaries north of which inscriptions are in Latin and south of which they are in Greek. The Jireček line runs from what is today Lesh (ancient Lissus), on the Adriatic coast of Albania, across Mirëdita (northern Albania) and Debar (Republic of Macedonia, Albanian Dibra), between Skopje (Scupi) and Stobi (near Prilep), then north to Niš (Serbia, ancient Naissus) and southeast, south of Sofia (Bulgaria, ancient Serdica) and across the Balkan (Haemus) mountains to the Black Sea at Varna (Odessos) (Jireček, 1911, 38–39). Greek was established along the Black Sea coast as far north as Tulcea (Romania, ancient Aegissus). Skok (1934) had the line starting at Vlora (ancient Valona) and moving northeast through Ohrid (Lychnidus) to Sofia (Serdica) and across the Balkan range to Varna. It is now generally acknowledged that the difference between the two lines represents a zone of cultural bilingualism. Latin persisted as a written language in the Balkans until the 6th century ce, at which point inscriptions cease. The next datable written evidence of Balkan Romance does not appear until 1521, in a letter sent by Neacşu of Câmpulung to the mayor of Braşov warning him of an impending Ottoman attack. During the course of this thousand-year gap, Balkan Latin developed into the Balkan Romance languages: Romanian north of the Danube and Aromanian and Meglenoromanian south of the Danube, and today found south of the Jireček line (although communities in diaspora live throughout the Balkans and beyond). The term Vlah in general refers to Aromanian and Meglenoromanian, but it is sometimes also used for Romanian dialects spoken in eastern Serbia.

1.1.3. Balkan Slavic

Slavic speakers began to arrive in the Balkan peninsula from northeastern Europe in the 6th and 7th centuries ce (Fine, 1991, 25–73), at a time when Common Slavic was breaking up. Slavic literacy began in the 9th century ce, and the non-East Slavic language of the oldest documents (those preserving distinctions between short high vowels, nasal vowels, and other archaisms), up to about 1100ce, is called Old Church Slavonic. After that, the languages are referred to as recensions of Church Slavonic or by modern language names. Unlike Greek, Albanian, and Balkan Romance, whose dialects spoken in the Balkan Peninsula are all part of the subject matter of Balkan linguistics, the South Slavic dialect continuum is divided between Balkan Slavic and non-Balkan Slavic. The former includes Bulgarian, Macedonian, and the southeastern dialects of the former Serbo-Croatian known as Torlak, while the latter are the remaining dialects of the former Serbo-Croatian as well as Slovene. The former Serbo-Croatian is also referred to as Bosnian/Croatian/Serbian (BCS) or Bosnian/Croatian/Montenegrin/Serbian (BCMS).

1.1.4. Romani, Turkish, and Judezmo

Speakers of the Middle Indic dialects that became Romani migrated from what is today west-central India to the Byzantine Empire, probably ca. 900–1100ce. The exact timing and route are the source of debate and speculation, but it is clear that Romani took its modern shape in intense contact with Greek and some contact with Slavic, before speakers began to leave the Balkans at about the time of the Ottoman invasions. While all dialects of Romani show some Balkan features, only those that remained in the Ottoman Balkans participated fully in Balkan convergent processes. The oldest Romani documents are from 16th-century England and Holland. The oldest document with Balkan Romani dates from 1668 (Friedman & Dankoff, 1991).

Although speakers of Turkic languages came in various groups to the Balkan peninsula from late antiquity into the Middle Ages, there is no direct evidence that Turkic had any lasting linguistic impact on the Balkan peninsula, aside from the occasional lexical item, until the arrival of speakers of the Oghuz Turkic dialects that were or would become Gagauz and Turkish. Turkic-speaking mercenaries were used in Europe by feuding Byzantine factions during the late Middle Ages, but the beginning of the Turkish political conquest of Europe can be dated to 1354, when the Ottoman Turks crossed the Dardanelles (Hellespont, Çanakkale Boğazı) and occupied Gallipoli (Gelibolu). They took Edirne (Adrianople) in 1369, and in 1371 defeated Serbian-led forces at Chernomen on the river Marica, near the present Greek-Turkish-Bulgarian border. Although the battle of Kosovo Polje (1389) is given more prominence in various modern national mythologies, it was the 1371 battle on the Marica at Chernomen (known in Turkish as Sırp sındığı ‘destruction of the Serbs’) that was decisive in opening the way for the eventual Ottoman conquest of the Balkan Peninsula (Fine, 1994, pp. 379–382). From the end of the 14th century until the beginning of the 20th, the heartland of Balkan linguistic phenomena, now the territories of Albania, Kosovo, southern Montenegro, southern Serbia, the Republic of Macedonia, Northern Greece, Southern Bulgaria, and European Turkey, was Ottoman, although at times Turkish rule extended over the entire Balkan Peninsula, except what are today Slovenia and northwestern Croatia. Although some linguistic phenomena have their origins in antiquity or the Middle Ages, it was during the Ottoman period that the Balkan languages took their definitive modern shape and that the phenomena most characteristic of the Balkan linguistic league took their modern form. In terms of documentation, the oldest Turkic documents are the 8th-century ce Orkhon inscriptions, from what is today central Mongolia. The language of these inscriptions, however, shows striking similarities to Oghuz Turkic (Tekin, 1968). Although Standard Turkish differs significantly from the Balkan languages, a number of local dialects, especially Gagauz in eastern Bulgaria and West Rumelian Turkish in the western Balkans, show significant Balkan linguistic convergences.

Many Jews speaking the local dialects of the Iberian Peninsula, having been expelled from Spain (1492) and Portugal (1497), found refuge in the Ottoman Empire. Thenceforward their home language developed independently of the dialects of Spain and Portugal and in contact with Greek, Turkish, and other Balkan languages. The dialects of this language, known as Judezmo or Ladino or Judeo-Spanish, have, in the Balkans, participated to some extent in Balkan linguistic converg.ences.

Languages of the BalkansClick to view larger

Figure 1. Languages and Dialects of the Balkan Sprachbund.

(Source: p. 203 in Victor A. Friedman (2007), Balkanizing the Balkan Sprachbund: A closer look at grammatical permeability and feature distribution, in A. Y. Aikhenvald & R. M. W. Dixon (Eds.), Grammars in contact: A cross-linguistic typology (pp. 201–219), Oxford: Oxford University Press. © Victor A. Friedman 2007.)

Many of the Balkan languages are overlapping and co-territorial, and it would therefore be misleading or inaccurate to label a specific territory with a specific language. While nation-state languages dominate in most of the territories of the respective nation-states, there are regions of various sizes in all of them where such is not the case. Moreover, at the beginning of the 20th century all seven groups were represnted on all the territories that would become the today’s nation-states, and in most states this is still the case today. The Aromanian isogloss bundle roughly follows the Macedonian one after intersecting with it.

2. Dialectology

Balkan linguistics is largely a linguistics of dialects, themselves the results of historical divergences, and therefore a few words about the basic dialect divisions of the relevant languages are needed here. With the exception of Tsakonian, which displays archaisms inherited directly from Doric and has the status of a separate Hellenic language, all the dialects of Modern Greek are descended from the Hellenistic Attic Koine that replaced all the Ancient Greek dialects. The basic main dialect division, between north and south, runs through the Gulf of Corinth and then north of Attica. Albanian has a major dialect division between the Geg dialects spoken north of the river Shkumbî in Albania, with transitional isoglosses extending no more than 20 kilometers south of the Shkumbî at the widest point. The dialects south of the Shkumbî are called Tosk. The Tosk dialects of southwestern Albania are Lab, those of Epirus are Çam, and those further south are Arvanitika (autonym Arbërisht). The Tosk dialects of Italy are Arbëresh. The division between Geg and Tosk dates from after the Roman conquest, but was in place or completed at the time Slavic speakers entered the peninsula. The dividing line used to run along the river Drin in the middle of Struga (Republic of Macedonia) and across Lake Ohrid north of Resen and Bitola (recent mobility has complicated the distribution). For Macedonian, the most important isogloss bundle follows the course of the river Vardar to the river Crna and continues southwest into Greece. The Torlak dialects of the formner Serbo-Croatian can be diveded into the Timok-Nišava dialects, contsitituting approximately the eastern half of Torlak in Serbia, and the Prizren–South Morava, comprising southern Kosovo and the western part of Torlak in Serbia. For Bulgarian, a major isogloss separating the reflex of Common slavic *ē as /e/ (to the west) versus something else in at least some environments (/ja/, /æ/, etc.) separates roughly the eastern two-thirds from the western third of the territory. The divisions between eastern and western Balkan Judezmo and East versus West Rumelian (Balkan) Turkish follow roughly the same boundary (Hazai, 1961). The Gagauz of the Republic of Moldova is more influenced by Russian, while the dialects of Romania and Bulgaria are influenced by Romanian and Bulgarian, respectively.The main Romanian dialects are those of Moldavia (including Bessarabia, Dobrudja, and Bukovina), Wallachia, and Transylvania, for which Banat, Crişana, and Maramureş are the defining regions. The main dialectal boundary for Aromanian follows the mountains that form the Greek-Albanian political border and then the Crna-Vardar route in Macedonia. Meglenoromanian is spoken today in only seven villages near Lake Dojran, five in Greece and two in Macedonia. Of these, the village of Tsărnareka (Greek Kárpē), in Greece, is more heavily Slavicized than the others. Istro-Romanian separated from Romanian and moved to the Istrian Peninsula in the late medieval or early modern period and is generally not included in Balkan linguistic studies. For Romani the two dialects relevant for the Balkans are known as Balkan and Vlax, which can be the source of some confusion. The Balkan dialects are those that remained in the Ottoman heartland (ss §1.1.4) and adjacent regions (these divided into two subgoups labeled I and II), while speakers of what became the Vlax dialects migrated north of the Danube into what is now Romania and spent enough time there to diverge in certain diagnostic features. Subsequently, one group migrated south back into the Ottoman Empire, while another group migrated northward to what were then the Austro-Hungarian, German, and Russian Empires.

3. Phonology

There are no phonological developments that are characteristic of the Balkan languages as a group. Among the various candidates for such status have been the presence of stressed schwa; rhotacism; a simple five-vowel system (a, e, i, o, u); the lack of suprasegmental features such as length, tone, and nasality; etc. None of these, nor any of the other proposed candidates, stand up to scrutiny. Thus, for example, stressed schwa, which is found in many, but not all, Balkan languages or dialects, has different origins not only in the individual languages but even in the dialects of a single language. Thus, for example, stressed schwa in northern Macedonian comes from short high vowels, in east central Macedonian it is from vocalic /l/, in southwestern Macedonian from a back nasal vowel, and is limited in occurence to the position before older vocalic /r/ in some other peripheral dialects. Schwa is also absent from the west central dialects on which Standard Macedonian is based. Rhotacism is from various sources and occurs at different time periods in the languages that have it; many Balkan dialects have more than the abovementioned five vowels, mere absence of a feature is not in and of itself diagnostic of anything, and so on.

It is also true, however, that local dialects of languages in contact do show specific phonological convergences. Thus, while it can be said that there is no Balkan phonology, there are Balkan phonologies (Friedman, 2008a). A striking example is the convergence of the loss of nasality in Debar Albanian and Macedonian. Debar Albanian is the only Geg dialect with no nasal vowels whatsoever, and Debar Albanian and Debar Macedonian share the development of a rounded mid-back nasal vowel into a denasalized equivalent. In both dialects āN[C] > õ[C] > å[C]. For Albanian āN[C] > ã is a process pre-dating the Geg-Tosk split, and for Slavic āN[C] > open õ is Early Common Slavic. The rounding of ã to open õ in much of Geg is likewise an Albanian process. It is likely that this was the situation when Albanian and Slavic speakers encountered one another in the Debar region. The striking nature of the convergence is seen in two facts. First, while some Geg dialects in Macedonia have lost the other nasal vowels, open õ from ã is always preserved except in Debar. Second, while Common Slavic open õ was denasalized to an unrounded vowel everywhere else in western Macedonia, it was only in the Debar region that denasalization yielded ɔ (open o, å). Thus, while the regular sound changes are specific to the respective languages, there is also a convergence. Trummer (1983) speculates on various possible substratum effects on Balkan Slavic in general.

Other examples of convergent phonology are seen in Kosovo, where local Albanian, Turkish, and Slavic dialects all lose /h/ as well as the opposition between mellow and strident palatal affricates. Aromanian in Greece has adopted the interdental fricatives of Greek, and in the local Macedonian dialects, such interdental fricatives occur not only in Greek loans but even sometimes in native words. Romani dialects in Greece tend to replace palatal affricates with dental ones owing to the influence of Greek, and the Greek dialects around Kastoria (Macedonian Kostur) preserve homorganic nasals before voiced stops just as do the local Macedonian dialects (Papadamou & Papanastassiou, 2013). Other examples could be cited, but these suffice to show that the concept of regularity of sound change, so fundamental to the prinicpled study of genealogical linguistics, can, within certain parameters, also take place under conditions of language contact. Nonethless, such examples are strictly localized and limited. In general, dialectal phonology serves an emblematic function and is, overall, resistant to contact-induced change. The exceptions are so limited and specific that the rule retains its probative validity.

4. Morphology

As with phonology, so, too, with inflectional morphology, shared innovation is characteristic of the language family rather than the linguistic league, and examples of contact-induced change in the Balkans are rare and local. The most frequently cited example of borrowed inflectional morphology is Capidan’s (1925, pp. 159–161) speculation that the Meglenoromanian 1sg and 2sg present markers -m and —for example, aflum, afliš, but also aflăm, aflăš ‘find’come from contact with the local Macedonian dialects, where these same markers are used. On more careful examination, however, it turns out that these desinences are not found in the villages with the most intensive contact with Macedonian; they only occur in a few verbs where dropping the final vowel would result in a consonant cluster, and 1sg -m and 2sg have possible sources in other conjugations (see Friedman, 2012a for details). Thus, while Macedonian may have had an influence, it is not an unambiguous source. An unambiguous example is provided by the Meglenoromanian of Tsărnarekă, the most heavily Slavicized Meglenoromanian village, where the Macedonian gerundive marker -ajkji has been borrowed and attaches to native stems, for example nirdzeajkji ‘going’ (Atanasov, 2002, p. 235). Another clear example is that of the 1pl and 2pl preterite desinences in some Romani dialects that have borrowed the corresponding markers from Turkish. Romani verbs usually form the preterite stem by adding a consonant, often -d-, to the root, for example ker- ‘do’ > kerd-, 1pl preterite kerdam, 2pl preterite kerdan~kerden (in the relevant dialects). This preterite often corresponds to the Turkish preterite in -DI, for example, kır- ‘break’ preterite 1sg kırdım, 2sg kırdın, 1pl kırdık, 2pl kırdınız, vur- ‘hit’ vurdum, vurdun, vurduk, vurdunuz. The relevant dialects of Romani borrow the /-Vz/ of the 2nd pl preterite as -ə(s) or -us, attach it to the native 2sg preterite, and sometimes extend it analogically to the 1pl preterite, for example 1pl kerdaməs, 2pl kerdanəs. Most of these speakers are in eastern Bulgaria and also speak Turkish, but in at least one case (Agia Varvara, near Athens), Turkish is no longer spoken (Elšík & Matras, 2006, p. 136). These dialects, and a number of others, also conjugate borrowed Turkish verbs using Turkish inflection (see Friedman, 2013a for details).

Derivational morphology, being basically lexical in nature, is well attested in terms of Balkan commonalties, with affixes borrowed from Greek, Latin, Slavic, and Turkish. Among the most widespread derivational affixes common to all the Balkan languages are the Turkish suffixes -CI ‘agentive noun,’ -lIK ‘abstract noun,’ and -lI ‘attributive’ (capital letters indicate sounds subject to alternations of voicing and vowel harmony). These suffixes occur not only in numerous borrowings from Turkish, but also as productive suffixes on native roots and new loanwords, for example, Balkan Slavic lov- ‘hunt,’ lovdžija ‘hunter,’ Greek taksidzēs ‘taxi driver,’ Judezmo sedaka ‘charity’ sedakadžis ‘beggar,’ Romanian varvarlîk and Aromanian varvarlike ‘barbarism,’ Albanian Skraparlli ‘person from Skrapar (a town in Albania),’ etc.

5. Morphosyntax

The shared morphosyntactic convergences of the Balkan languages first brought those languages as a group to the attention of West European linguists. In an adaptation of Kopitar’s (1829, p. 86) famous formulation, it can be said at this level the Balkan languages give the impression of having a single grammar with various lexicons. It is often the case that a word-for-word translation from one Balkan language or dialect into another will be completely grammatical and even idiomatic. Such shared phenomena resulting from contact-induced convergence were first called Balkanisms by Seliščev (1925). There are many such features, and a few of the most salient will be examined here. For the most part, morphosyntactic Balkanisms involve the use of native material to express convergent grammatical processes, although in some instances the material itself can be borrowed.

5.1. Analytic Subjunctives

All the Indo-European Balkan languages have analytic subjunctives that replaced earlier infinitives. These analytic subjunctives, which are formed by means of a native particle (Balkan Slavic da, Albanian , Romani te, Balkan Romance să, si, s,’ Greek na) plus a finite verb, are used as complements, but can also stand alone as optatives or desideratives. Moreover, in the West Rumelian Turkish dialects, the optative is often used as a calque on such usage, where standard Turkish would require an infinitive. Table 1 is illustrative.

Table 1. Balkan Analytic Subjunctives

Languages of the Balkans

The replacement of the infinitive with an analytic subjunctive has been a process realized to different degrees in the various Balkan languages. Thus the inherited infinitive has completely disappeared in western Macedonian, Torlak, and Romani. Remnants of the form but not the function survive in Modern Greek in the formation of the perfect series. In some parts of southeastern Macedonian, Meglenoromanian, and Bulgarian, traces of the infinitive survive in a few expressions. Romanian still has an infinitive of limited use, and some traces (other than the verbal noun, which, as in the situation in Modern Greek, is of infinitival origin but not function) appear to have survived in some Aromanian dialects (Joseph, 1983, pp. 175–176). Joseph (1983, pp. 93–94) argues that the most likely scenario for Albanian is that, like the other Balkan languages, it inherited some sort of infinitive from Indo-European and then replaced it. In addition to the analytic subjunctive, Modern Geg has a new infinitive formed with me plus a short participle, while Tosk has a construction with për të plus long participle that has limited infinitival functions. Judezmo has preserved the infinitive, but it has subjunctive usages with ke that are Balkan rather than Spanish, as in example (1):


Languages of the Balkans

All Balkan analytic subjunctives share the fact that they replaced infinitives after the medieval period, that is, during the Ottoman period (cf. Joseph, 1983, pp. 83–83; Asenova, 2002, p. 214).

5.2. Future Marking

The Indo-European Balkan languages have all developed innovating analytic futures using particles derived from native verbs meaning ‘want.’ As in the replacement of infinitives with analytic subjunctives, with which future formation is connected, for those languages for which we have the attestations, we can see that the main processes leading to the current shapes occurred during the Ottoman period (Asenova, 2002, p. 214). Although future constructions of the type ‘want’ + infinitive are attested in the Middle Ages for Greek, Romance, and Slavic, it was only and precisely in the Balkans that this type of future marking became the sole or dominant type of paradigm. The development took place in four stages: (1) conjugated ‘want’+infinitive, (2) conjugated want+analytic subjunctive, (3) particle based on ‘want’+ analytic subjunctive, (4) particle + finite verb.

Table 2 is illustrative.

Table 2. Balkan Futures

Languages of the Balkans

Notes: Standard Albanian requires the subjunctive particle, but it is omitted colloquially. Central and Northern Geg often have conjugated ‘have’ + infinitive but also have the option of using the Standard Albanian future, and some dialects use conjugated ‘want’ + infinitive. Some Torlak dialects preserve a distinct 1sg future marker (ču) but use the invariant marker elsewhere. Romanian has competing constructions using a conjugated auxiliary based on ‘want’ (1sg standard voi regional oi) or ‘have’ (1sg/pl am) + bare infinitive. In Meglenoromanian, the future marker has been assimilated into the subjunctive marker except in Tsărnareka, where the particle ăs is used.

The competition between the future of volitional origin using ‘want’ and the future of necessitative origin using ‘have’ have resulted in some additional complexities. As indicated in the note to Table 2, some Geg and some Romanian dialects have a future using conjugated ‘have’ plus analytic subjunctive or infinitive (see Friedman, 2005 for details on Albanian). In Balkan Slavic, the negative of have (Macedonian nema Standard Bulgarian njama) functions as a negative existential that can combine with the analytic subjunctive to form a negative future. This construction has been calqued into some dialects of Aromanian, Romani, and Turkish dialects in contact with Macedonian, as seen in Table 3.

Table 3. Negated ‘Have’ Futures

Languages of the Balkans

5.3. Anterior Future as Conditional

The anterior future is formed by a combination of imperfect marking and future marking, with individual languages showing differences as to where to two markings occur in the paradigm. This construction replaced older inherited conditionals in Albanian, Greek, Balkan Romance, Balkan Slavic, and Romani, in some instances only partially, in others completely. Gołąb (1964) and Belyavski-Frank (2003) are the classic works on this subject and have the complex dialectological details. The combination of future+preterite marking also has conditional functions in Turkish. (Erdal, 2004, pp. 270, 520 shows future+past copula for some conditional meanings in Old Turkic.) As with future formation, these developments in the Indo-European Balkan languages took shape during the Ottoman period. Table 4 is illustrative.

Table 4. Balkan anterior Future as Conditional (impf ‘imperfect’)

Languages of the Balkans

5.4. Analytic Adjectival Gradation

Analytic gradation in the Balkans is another feature for which the definitive developments took place during the Ottoman period. Table 5 is illustrative.

Table 5. Balkan Adjectival Gradation

Languages of the Balkans

In the case of Romani, only the dialects in the Balkans have eliminated older synthetic comparatives in -eder, and in all instances have borrowed analytical markers from contact languages. Although some analytism developed in Slavic outside the Balkans, those patterns are different, and the non-Balkan dialects are quite conservative in this respect, showing that the Balkan Slavic development took place in contact with the other Balkan languages.

5.5. Object Reduplication

All of the Balkan languages use accusative and dative resumptive clitic pronouns agreeing in gender-number-case with certain direct and indirect objects, a phenomenon which, in Balkan linguistics, is commonly called object reduplication. Example (2) from Macedonian is illustrative:


Languages of the Balkans

Each of the Balkan languages has different rules concerning when object reduplication is expected or required. Thus, for example, according to the Macedonian norm all definite direct and all indirect objects are supposed to be reduplicated, whereas according to the Bulgarian norm, reduplication is generally relegated to topicalization in the colloquial and is not permitted in formal writing. As Kallulli (1999) has shown, object reduplication functions to provide topicalization in Greek and Albanian, but the two languages differ insofar as the reduplication is sometimes required in Albanian when it is facultative in Greek. Aromanian dialects follow the facts of the dominant contact language: Greek in Greece and Macedonian in Macedonia. Meglenoromanian also behaves like Macedonian. In Romanian, however, reduplication is limited to definite or personal pronouns, preverbal definite direct and all indirect substantival objects, and postverbal objects governed by pe (i.e., humans). In Romani, reduplication functions to mark possession and focus. Friedman (2008b) provides a survey. Although left dislocation is attested in Ancient Greek and the Hellenistic Koine, it is not until the post-classical period—that is, when there was contact with Vulgar Latin, which shows evidence of object reduplication—that it becomes more robust in Greek. Attempts have been made to show object reduplication as present in Old Church Slavonic, but the scantiness and marginality make it clear that the current situation also has its origins in the Ottoman period, when there was intensive language contact. Balkan Judezmo also has object reduplication where Iberian Spanish would not, as seen in (3a) with the Macedonian equivalent given in (3b):


Languages of the Balkans

Although there is some reduplication in Iberian Spanish and elsewhere in non-Balkan Romance, Wagner (1914, pp. 130–131) observes that reduplicated object pronouns occur more frequently in Constantinople Judezmo than in Spanish, and Sandfeld (1930, p. 192) makes it clear that the Balkan phenomenon does not involve dislocation but rather is more integral to the clause as a whole.

5.6. Declension Simplification

The elimination of nominal declension is a feature that is often cited as a typical Balkanism, and for Slavic this is definitely the situation. When Slavic entered the Balkans, it had all the inflectional cases inherited from Indo-European except the ablative (which had merged with the genitive, and in most instances, replaced its desinences). By the beginning of the 20th century, there was a region stretching from eastern Macedonia and adjacent parts of Greece into western Bulgaria where the local dialects had eliminated all vestiges of nominal declension except in the pronouns, where some accusatives and datives were preserved. The loss is ongoing, and some pronominal datives and accusatives are in the process of being replaced among younger speakers, much to the dismay of older speakers. On the other hand, Balkan Romance is unique among the Romance languages in having preserved some nominal declension, the basic opposition being nominative/accusative vs dative/genitive. Greek and Albanian both have at least some distinct accusatives and dative/genitives in their nominal systems. Albanian also has a distinct indefinite ablative plural, and traces of a locative survived into the modern period. It is also worth noting that in Albanian, while the desinences of the dative and genitive are identical (except for the ablative plural just mentioned), the genitive is always preceded by a particle of concord, while the dative-ablative never is. The syncretism that is considered particularly Balkan is the merger of dative and genitive. Historically, however, the genitive replaced the dative in Greek and the dative replaced the genitive in Albanian, Romance, and Slavic (Demiraj, 1986, pp. 257). Romani, on the other hand, has all of the same eight cases as were in Indo-European, although most of them are now marked by innovating postpositions (see Friedman, 1991; Matras 2002, pp. 80–94). It is only in the late 20th and early 21st centuries that, in Macedonia and Bulgaria, the Romani case system has begun to erode, showing a merger of dative and locative, analytic expression of the ablative and dative/locative, and the ablative expanding into domains of the genitive. Balkan Turkish preserves the same cases as the rest of Turkish, although there is some dative/locative confusion in West Rumelian Turkish, which reflects language contact.

5.7. Referentiality

The postposed definite article of Balkan Romance, Balkan Slavic, and Albanian was one of the first recognized Balkanisms (Leake, 1814). The postposing of the article in Balkan Romance is unique in Romance. The development of a genuine definite article, and a postposed one at that, is specific to Balkan Slavic within Slavic. A superficially similar phenomenon in North Russian dialects is not a true article (Koduhov, 1953), and the usages of demonstratives as definiteness markers in colloquial Slovene, Czech, and some Polish, all under German influence, is both less developed and a separate phenomenon. Hamp (1982) suggests that postposed definite marking was already present in the ancestor of Albanian, but the question of whether preposing or postposing is older is not settled. (See Demiraj, 1986, pp. 297–336 for a discussion of the arguments.) Greek had already developed a demonstrative into a preposed article before the arrival of Latin, and the Romani preposed definite article, which superficially resembles the Greek, probably developed out of native materials, albeit in contact with Greek (Sampson, 1926, pp. 151–153, 247–249; Matras 2002, pp. 96–98).

Much less attention has been paid to the marking of indefiniteness, but this, too, has arguable Balkan relevance. While the development of unstressed ‘one’ into an indefinite article is broadly attested, it was absent from Latin, Ancient Greek, and Old Church Slavonic but present in Orkhon Turkic. Pan-Romance usage points to a possible Latin impetus in a Balkan context, aided perhaps by Turkish and Albanian. Greek, Balkan Slavic, and Romani all clearly developed such usage in a Balkan context, and their usage is of ‘one’ as an indefiniteness marker is statistically about half the frequency as in the other languages (Friedman, 2003). For Slavic, it is still absent from East Slavic. Those languages in contact with German have developed similar usages in their colloquial registers, and outside the Balkans, Romani patterns with its major contact languages, but the Balkan usage of ‘one’ as an indefiniteness marker clearly developed in Balkan Slavic during the Ottoman period.

5.8. Evidentials

Balkan Slavic, Albanian, Meglenoromanian, Romanian, and some dialects of Aromanian, Judezmo, and Romani all, like the Turkic languages, have means of marking on the verb the speaker’s commitment to the truth of the statement. While the source of this commitment (or lack thereof) is frequently based on actual evidence, that is, witnessing, a report, inference, etc., this is not always literally the case. These languages thus employ what Aikhenvald (2003) calls evidential strategies. Example (4) from Macedonian is illustrative (cnf ‘confirmative,’ ncnf ‘nonconfirmative’):


Languages of the Balkans

The speaker is basing both statements on the single report by his aunt on the telephone, but while he uses a synthetic past (confirmative) to report his uncle’s absence, he uses an old perfect (nonconfirmative) to express his whereabouts, since he is certain his uncle would have come to the phone, but not that he is actually at the beach.

The situations in Balkan Slavic and Turkish are quite similar: the Balkan Slavic synthetic pasts (aorist and imperfect), like the Turkish past tense in -DI, positively assert the speaker’s commitment to the truth of the statement. The Balkan Slavic old (i.e., original Common Slavic) perfect in -l, like the Turkish perfect in ‑mIş, does not express such confirmation. There are also strategies to avoid making this choice, but they differ among the languages. However, nonconfirmativity can also be actively expressed. This is contextually the case with the Balkan Slavic l-past and the Turkish mIş-past and always the case with the newer (i.e., post-Common Slavic) Balkan Slavic paradigms using auxiliaries based on the old perfect in -l. In Albanian, Balkan Romance, and Romani, it is precisely nonconfirmativity for which there are special verbal markers. The nonconfirmative complex consists of three sets of meanings, which can be described in Austin’s (1962) terms as marked felicitous, marked infelicitous, and neutral. A neutral nonconfirmative renders a report or inference. A marked felicitous nonconfirmative is dubitative, that is, it expresses disbelief, doubt, irony, etc. A marked infelicitous nonconfirmative is admirative, that is, it expresses surprise at a fact that would have contradicted the speaker’s earlier expectations. It is infelicitous in the sense that it actually expresses confirmation, but this confirmation refers to a previous state when the speaker would have withheld confirmation. Albanian has a set of nonconfirmative paradigms referred to in Albanian as habitore (habi ‘surprise’), translated as ‘admirative’ (from the French admiratif used by Dozon, 1879, p. 226; see also Friedman, 2012b). Thus for example the Albanian sentence Ai kërcyeka mirë. ‘He dances well’ can, depending on context, express surprise at a newly discovered fact (admirative sensu stricto), ironic rejection of a previous statement (dubitative), or a neutral report. As with the Balkan Slavic and Turkish phenomena, the Albanian admirative has its origins in an older perfect. However, whereas the Balkan Slavic and Turkish nonconfirmatives are meanings attached to the perfect as a result of contrast with the confirmative paradigms, in Albanian the original perfect, for example ka kërcyer literally ‘has danced,’ was inverted from auxiliary+participle to participle+auxiliary, which then acquired a present meaning and served as the basis for new paradigms, for example ka pasur ‘has had’ (perfect), paska ‘has’ (admirative present), paska pasur ‘has had’ (admirative perfect). Based on the evidence in the oldest Albanian texts (16th–17th century), as well as dialectological evidence, it is clear that the current complex of meanings in Albanian developed during the Ottoman period. This same complex of meanings has attached to the inverted (i.e., participle+auxiliary rather than auxiliary +participle) perfect and pluperfect in Meglenoromanian, for example perefect am măncată ‘I have eaten,’ inverted perfect măncat-am ‘[Apparently, To my surprise, It is said [± but I don't believe it] I ate/have eaten.’ In the Meglenoromanian pluperfect, however, the plain indicative is formed with the imperfect of the auxiliary plus the participle, for example, vḙam măncată ‘I had eaten,’ while the admirative type is formed with the admirative (i.e., inverted perfect) of the auxiliary + participle, for example vut-am măncată ‘[Apparently, To my surprise, It is said [± but I don't believe it] I had eaten’ (Atanasov, 2002, pp. 240–247). Here the impetus could have come from Macedonian or Turkish or both. As with Albanian, the postposing of the auxiliary in the perfect was a pre-existing option, and the nonconfirmative meanings became attached to this form during the Ottoman period. The Frasheriote Aromanian dialect of Gorna Belica (Frasheriote Aromanian Bela di sus) in the Republic of Macedonia, whose speakers came to their current location from central Albania, has borrowed the Albanian third-singular present admirative marker -ka, which it attaches to a native participle in order to form a set of paradigms with exactly the same structure and meanings as the Albanian admirative. In Romanian, the modul presumptive ‘presumptive mood’ carries exactly the same nonconfirmative complex of meanings, but uses a future (or conditional or subjunctive) particle plus invariant fi ‘be’ plus the gerund or the past participle. It thus represents a morphologically distinct but semantically identical development. In some Balkan dialects of Romani in Macedonia and Bulgaria, an interrogative particle can be attached to a finite verb to express the same nonconfirmative complex or some portion of it, that is, disbelief (dubitativity), and, in Bulgaria, also admirativity and reportedness (Friedman, 2013b, Igla, 2006). The Romani developments have taken place in contact with Balkan Slavic and Turkish. Finally in the Judezmo of Istanbul, the pluperfect is used for unwitnessed, reported, and inferred events as a calque on the Turkish mIş- past, in contexts where Spanish would not permit a pluperfect (Varol, 2001). At issue is rendering the sense of epistemological distance conveyed by a nonconfirmative by using a form marked for taxis. Balkan evidentiality thus represents a complex picture of bath contact and parallel development, but it is located firmly within the Ottoman period, except for Turkish itself, for which such distinctions are attested in the oldest Turkic records (8th century ce).

5.9 The Teens

The construction of teens as ‘numeral-on-ten’ in Balkan Slavic, Balkan Romance, and Albanian, for example Old Church Slavonic edinŭ na desętŭ, Albanian një.mbë.dhjetë, Romania un.spre.zece ‘eleven,’ literally ‘one.on.ten,’ is often cited as an example of Slavic influence in the Balkans, since the construction is shared by all of Slavic. However, Hamp (1992), on the basis of the fact that ‘ten’ is masculine in Old Church Slavonic, neuter in Latin, but feminine in Albanian and Romanian, argues that this is one of a number of older areal features shared by the Northwest Indo-European dialects that became Albanian and Slavic, and which, together with Germanic, were in contact north of the Carpathians prior to migrations to the Balkans. If this was the case, then the Romanian construction joins other features and lexical items that it shares specifically with Albanian and should be attributed to the pre-Latin period.

6. Syntax

Joseph (2001) makes a useful distinction between comparative Balkan syntax and formalist (generative) syntax of the Balkan languages. The former examines those features that can arguably be considered as resulting from language contact. The latter engages in the basically typological project of seeking linguistic universals and happens to do so using languages in the Balkans. The former is the concern of Balkan linguistics, and such phenomena are surface-oriented. Some typical examples will be considered here.

6.1. Clitic Order

All the Indo-European Balkan languages began with a clitic order that did not permit clitics in absolute initial position. Over time, Greek, Balkan Romance, and Albanian became congruent in requiring pronominal (and some other) clitics to come immediately before the verb, even if this means that they are sentence initial. The examples (5a–d) and (6a–d), from Alexander (2000), illustrate the relative positions of Macedonian, Bulgarian, and BCMS vis-à-vis Balkan clitic order. In BCMS, these clitics come after the first stressed word in the phrase, and in Bulgarian, they come immediately before the verb unless that would result in sentence initial position, in which case they must follow, while Macedonian is exactly congruent with the non-Slavic Balkan languages.


Languages of the Balkans


Languages of the Balkans

6.2. Constituent Order

Balkan word order is relatively free. Still, there are certain tendencies in the respective languages, and contact varieties show convergence. A particularly striking example is West Rumelian Turkish. Example (7a–d), based on Ibrahimi (1982, p. 53), gives the West Rumelian Turkish (a), Macedonian (b), Albanian (c), and Standard Turkish (d) of a sentence that illustrates a variety of word order differences, including genitive-head constructions, main and subordinate verb position, and infinitive versus subjunctive usage. The following non-Leipzig abbreviations are necessary here: pc ‘particle of concord,’ sp ‘subjunctive particle.’


Languages of the Balkans

7. Lexicon

Until the Second World War, the bulk of Balkan linguistic inquiry was concerned with lexicon. Thus, for example, Sandfeld’s (1930) classic book devoted 40% of its space to shared vocabulary. By contrast, Asenova’s (2002) book of about the same size devoted only 10% of its space to that topic. These studies have concentrated on what Trubetzkoy called Kulturwörter ‘culture words,’ and indeed much of the common Balkan vocabulary has such associations. Thus, for example, much of the oldest layer of Balkan vocabulary, shared by Albanian, Greek, Balkan Romance, and Balkan Slavic, is associated with domestic life and pastoralism, for example Albanian shtrungë, Aromanian strungã ‘enclosure for milking sheep,’ Romanian strungă ‘enclosure for milking sheep, narrow passage,’ Macedonian and West Bulgarian strunga ‘enclosure for milking sheep or separating them from lambs’ (also Bulgarian străga/stărga), Greek (Epirus and Sarakatsan) stroúnga ‘dairy,’ or Albanian drugë, Aromanian drugă, Greek druga/dhruga, Balkan Slavic (dialectal) drug, BCMS druga ‘wooden bobbin, distaff.’ These words do not have etymons in any of the anciently attested Balkan languages, that is, Greek, Latin, and Old Church Slavonic, and so it is generally assumed that they come either from the ancestor of Albanian or some other ancient language of the Balkans. There are about 20 such words, depending on which etymologies one accepts. (See Neroznak, 1978, pp. 186–216; Katičić, 1976 remains authoritative, cf. Woodard, 2004; Hamp, 2007 is also an important source.) There are also about 70 words that Albanian and Romanian share, some of which are clearly Indo-European but not of Romance origin (i.e., their etymons are absent from Latin), for example Albanian dru ‘wood,’ Romn druete ‘woods,’ or Albanian buzë, Rmn buză ‘lip’ (see Kalužskaja, 2001 for a complete study). It is assumed that these go back to a pre-Latin language of the Balkans. Latin had a position of imperial prestige in the ancient Balkans comparable to that of Turkish in the early modern and modern periods, and it supplied loanwords at all levels, for example, Latin furca ‘fork’: Rmn furcă ‘pitchfork,’ Balkan Slavic furka ‘spindle,’ Albanian furkë ‘pitchfork, spindle,’ MGk fourka ‘gallows’; Vulgar Latin *furnu ‘oven’: Greek fournos, Aromanian furnu, Balkan Slavic furna, Albanian furrë, Turkish furun. Although the prestige of Ancient Greek affected Classical Latin, there are very few Ancient Greek words in the Balkan languages; among the handful found in Albanian are mokërë (Geg mokënë)‘millstone’ from Doric mākhānā and tarogzë ‘helmet’ from thōrākion ‘breastplate, armor.’ The conversion of the Balkan Slavs to Christianity brought in a significant number of Greek loans, and Slavic served as an intermediary for such loans into Romanian, on which it had a large lexical impact, since Church Slavonic was the language of literacy for Romanian Orthodox Christians for centuries. Slavic also had a significant lexical impact on Albanian (Ylli, 1997–2000), but considerably less so on Greek, despite there being Slavic speakers all over Greece and as far south as the southern Peloponnese as late as the 13th or 14th centuries (Fine, 1994, p. 166). The Fourth Crusade in the early 13th century brought a new wave of Romance lexicon to the Balkans, and the Ottoman conquest brought Turkish, which contributed vocabulary to all lexical fields and all parts of speech. Owing to the social positions of Roms and Jews in the Balkans, Romani has contributed mainly to slang and secret lexicons, while Judezmo is more or less absent from the common Balkan lexicon. (See also §4 above on derivation.)

In addition to culture words, there is a layer of Balkan vocabulary that Friedman and Joseph (2014) argue is the diagnostic feature of a Sprachbund. They describe this vocabulary as words Essentially Rooted In Conversation (ERIC). ERIC loans require the kind of intensive, multilingual, face-to-face interaction that is characteristic of Sprachbund formation. Examples of such words are kinship terms, numerals, pronouns, adpositions, negators, interrogatives, complementizers, discourse particles, interjections, insults, and idiomatic and everyday expressions. A typical example is the colloquial greeting on seeing someone the speaker knows well, given in Table (6). The meaning is ‘What are you doing?’ and the response is ‘Well.’

Table 6. Colloquial Balkan Greetings and Responses

Languages of the Balkans

8. The Balkans and the Rest of Europe

As a multilingual contact situation with millennia of documentation, the Balkans represent a unique resource for the study of language contact. At the same time, it is precisely the time-depth that makes it possible to identity the early modern period, that is, the Ottoman Empire, as formative for most of the Balkan linguistic phenomena as we observe them today. Since the fall of the Iron Curtain and the rise of the European Union, it has become fashionable to posit all of Europe as a linguistic area. Here Hamp (1989) was prescient when he characterized Yugoslavia as a “crossroads of Sprachbünde” not long before that country descended into war. As Hamp pointed out, Yugoslavia was characterized linguistically by “a spectrum of differential bindings, a spectrum that extends in different densities across the whole of Europe and beyond.” It is the “differential bindings” and “different densities” that are crucial. The historicity that Hamp (1977) pointed out is vital to understanding the processes involved. There are many contact-induced convergences that occur in Europe and beyond it, and in many cases, it is the fact of convergence, not the question of directionality, that is probative of a Sprachbund. In this sense, the Balkan Sprachbund represents a unique combination of linguistic factors constituting a distinct phenomenon.


The summary for this article was written and some of the research for it conducted while I was an Honorary Visitor at the Center for Research on Language Diversity, La Trobe University.

Digital Materials

Aromanian, Meglenoromanian, and Old Romanian materials (Note: scroll down to get to the links on this page)

Bulgarian Dialectology as Living Tradition.

Macedonian Academy of Arts and Sciences, Research Center for Areal Linguistics.

Jewish Language Research Website.

Romani Morpho-SyntaxDatabase.

Reference grammars for BCS, Bulgarian, Macedonian, and Romanian as well as outlines of Albanian and Romani

Dictionaries: There are many online dictionaries. For the Balkan languages, these items are particularly useful. Note that the Macedonian resource can supply English-Macedonian, and Turkish and Albanian equivalents are often given.











Further Reading

Friedman, Victor A. (2013). The languages of the Balkans. Oxford Bibliographies Online: Linguistics, ed. Mark Aronoff.

This annotated bibliography is a thorough guide to the field. Because the author of the present article is also the author of this bibliography, he reproduces here a statement from one of the anonymous readers who evaluated it for publication. The comment was sent to the author by e-mail on February 12, 2013.

