Victor A. Friedman
The Balkan languages were the first group of languages whose similarities were explained in modern linguistic terms as a result of language contact rather than as a result of descent from a common ancestor. Nikolai Trubetzkoy coined the term Sprachbund ‘linguistic league’ (as opposed to Sprachfamilie ‘language family’) to describe this relationship. Balkan linguistics, as both a subset of and precursor to contact linguistics, is, at its base, an historical linguistic discipline. It seeks to explain similarities among the relevant languages as the result of diffusion rather than of either transmission or of putative universal, typological properties of human language (which latter assumes parallel developments whose causation is ahistorical, i.e., unconnected with either contact or ancestry). The relevant languages are, with the exception of Turkic, all part of the Indo-European language family, but they belong to five distinct groups that are known to have been separated for a significant length of time (presumably millennia). Moreover, for four out of five Indo-European groups as well as for Turkic, there exists documentation that goes back more than a millennium, and in some cases several millennia. The Balkan languages are thus the oldest example of a well-documented and still living Sprachbund.
The primary questions that Balkan linguistics seeks to answer are these: What are the results of language contact in the Balkan languages, and how did they come about? The Balkan languages are traditionally defined as Albanian, Modern Greek, Balkan Romance (Romanian, Aromanian, and Meglenoromanian), and Balkan Slavic (Bulgarian, Macedonian, and the southernmost dialects of the former Serbo-Croatian). In recent decades, it has been recognized that the relevant dialects of Romani, Judezmo, and Turkish and Gagauz also participate in at least some of the convergent processes that are taken as definitive of the Balkan linguistic league. While the language family is defined by regular sound correspondences, which in turn help define shared morphology and a core lexicon, the Balkan linguistic league is defined principally by shared morphosyntactic developments and a shared lexicon of borrowings often called “cultural.” In the Balkan linguistic league, phonological developments are sometimes shared among different languages at the dialectal level, but there are no such features that characterize the Balkan languages as a group. Just as in the language family not every diagnostic item is represented in every branch, so, too, in the Balkan linguistic league not every feature is equally represented in all languages and dialects.
Among the most characteristic morphosyntactic features are the following: (1) replacement of infinitives by analytic subjunctives, (2) the use of a particle derived from etymological ‘want’ to mark the future, (3) replacement of synthetic gradation of adjectives with analytic constructions, (4) replacement of conditionals by anterior futures, (5) resumptive clitic pronouns for certain direct and indirect objects, (6) various simplifications in the declensional system, (7) postposed definite articles (for Balkan Slavic, Balkan Romance, and Albanian), (8) grammaticalized evidentials (Balkan Slavic, Albanian, Turkic, and to some extent Balkan Romance and Romani). While some of these convergences began in the ancient or medieval periods, the Balkan linguistic league took its definitive modern shape during the centuries of the Ottoman Empire (14th to early 20th centuries).
This is an advance summary of a forthcoming article in the Oxford Research Encyclopedia of Linguistics. Please check back later for the full article.
About 7,000 languages are spoken around the world today. The actual number depends on where the line is drawn between language and dialect—an arbitrary decision because languages are always in flux. But specialists applying a reasonably uniform criterion across the globe count well over two thousand languages in Asia and Africa, while Europe has just shy of three hundred. In between are the Pacific region, with over thirteen hundred languages, and the Americas, with just over 1,000. Many of the world’s languages are spoken by small populations and are thought likely to disappear over the next few decades, as speakers of endangered languages turn to more widely spoken ones.
The languages of the world are grouped into 141 language families, based on their origin, as determined by comparing similarities among languages and deducing how they evolved from earlier ones. While the world’s language families may well go back to a smaller number of original languages, even to a single mother tongue, scholars disagree on how far back current methods permit us to trace the history of languages.
While it is normal for languages to borrow from other languages, occasionally a totally new language is created by mixing elements of two distinct languages to such a degree that we would not want to identify one of the source languages as the mother tongue. This is the situation with Media Lengua, a language of Ecuador formed through contact among speakers of Spanish and speakers of Quechua. In this language, practically all the word stems are from Spanish, while all of the endings are from Quechua. Just a handful of languages have come into being in this way, but a less extreme form of language mixture has resulted in several dozen creoles around the world. Most arose during Europe’s colonial era, when European colonists used their language to communicate with local inhabitants, who in turn blended vocabulary from the European language with grammar largely from their native language. These so-called creole languages became so well established that they were passed on to the next generation, becoming a first language to many people, and continuing in use to this day.
Also among the languages of the world are about three hundred sign languages, used mainly in communicating with the deaf. The structure of sign languages typically has little historical connection to the structure of nearby spoken languages.
Languages have also been constructed expressly, often by a single individual, to meet communication demands. The prime example is Esperanto, designed to serve as a universal language and used as a second language by some two million, according to some estimates. But there are hundreds of others falling under the rubric of constructed international auxiliary languages.
Nora C. England
Mayan languages are spoken by over 5 million people in Guatemala, Mexico, Belize, and Honduras. There are around 30 different languages today, ranging in size from fairly large (about a million speakers) to very small (fewer than 30 speakers). All Mayan languages are endangered given that at least some children in some communities are not learning the language, and two languages have disappeared since European contact. Mayas developed the most elaborated and most widely attested writing system in the Americas (starting about 300 BC).
The sounds of Mayan languages consist of a voiceless stop and affricate series with corresponding glottalized stops (either implosive and ejective) and affricates, glottal stop, voiceless fricatives (including h in some of them inherited from Proto-Maya), two to three nasals, three to four approximants, and a five vowel system with contrasting vowel length (or tense/lax distinctions) in most languages. Several languages have developed contrastive tone.
The major word classes in Mayan languages include nouns, verbs, adjectives, positionals, and affect words. The difference between transitive verbs and intransitive verbs is rigidly maintained in most languages. They usually use the same aspect markers (but not always). Intransitive verbs only indicate their subjects while transitive verbs indicate both subjects and objects. Some languages have a set of status suffixes which is different for the two classes. Positionals are a root class whose most characteristic word form is a non-verbal predicate. Affect words indicate impressions of sounds, movements, and activities. Nouns have a number of different subclasses defined on the basis of characteristics when possessed, or the structure of compounds. Adjectives are formed from a small class of roots (under 50) and many derived forms from verbs and positionals.
Predicate types are transitive, intransitive, and non-verbal. Non-verbal predicates are based on nouns, adjectives, positionals, numbers, demonstratives, and existential and locative particles. They are distinct from verbs in that they do not take the usual verbal aspect markers. Mayan languages are head marking and verb initial; most have VOA flexible order but some have VAO rigid order. They are morphologically ergative and also have at least some rules that show syntactic ergativity. The most common of these is a constraint on the extraction of subjects of transitive verbs (ergative) for focus and/or interrogation, negation, or relativization. In addition, some languages make a distinction between agentive and non-agentive intransitive verbs. Some also can be shown to use obviation and inverse as important organizing principles. Voice categories include passive, antipassive and agent focus, and an applicative with several different functions.
Cynthia L. Allen
Middle English is the name given to the English of the period from approximately 1100 to approximately 1450. This period is marked by substantial developments in all areas of English grammar. It is also the period of English when different dialects are the most fully attested in the texts. At the beginning of the Middle English period, the sociolinguistic status of English was low due to the Norman Invasion, and although religious texts of Old English composition continued to be copied and updated, few original compositions are extant. By the end of the period, English had regained its status as the language of government, law, and literature generally.
Although some notable changes to the phonemic inventory of consonants date from the Middle English period, the most dramatic phonological developments of the period involve vowels. The reduction of the vowels of unstressed syllables, one of the changes that marks the beginning of the Middle English period, is a phonological change with substantial morphological effects, as it substantially reduced the number of distinctive inflectional forms. Constituent order replaced case marking as the primary means of signaling grammatical relations. By the end of the Middle English period, subject-verb-object order had become established as the norm.
The lexicon of English was transformed in this period by an enormous influx of French words. The role of derivational morphology declined as its functions were to some extent replaced by the adoption of French words. Most Scandinavian loans in English first appear in the texts of this period. The Scandinavian loans are typically everyday words, while the words adopted from French are more often in areas of government, law, and higher culture, reflecting the nature of the contact between English speakers and the speakers of these languages.
The density of the Scandinavian population in the northern part of England is generally held to be responsible for the earlier appearance of changes in the north than in the south. The replacement of the third person plural personal pronoun hie by the Scandinavian they is an example of a development which is apparent only in the north early in Middle English but became general in English by the end of this period.
An important phonological development of later Middle English is the beginning of the Great Vowel Shift, which affected long vowels and involved successive changes and was implemented differently in different dialects, the north-south divide being the most evident.
Early Middle English is a language that cannot be understood by Modern English readers without special study, while the language of the late Middle English period, especially that coming from the London area, can be understood with the heavy use of explanatory notes.
The Dravidian languages, spoken mainly in southern India and south Asia, were identified as a separate language family between 1816 and 1856. Four of the 26 Dravidian languages, namely Tamil, Telugu, Kannada, and Malayalam, have long literary traditions, the earliest dating back to the 1st century
A typical characteristic of Dravidian, which is also an areal characteristic of south Asian languages, is that experiencers and inalienable possessors are case-marked dative. Another is the serialization of verbs by the use of participles, and the use of light verbs to indicate aspectual meaning such as completion, self- or nonself-benefaction, and reflexivization. Subjects, and arguments in general (e.g., direct and indirect objects), may be nonovert. So is the copula, except in Malayalam.
A number of properties of Dravidian are of interest from a universalist perspective, beginning with the observation that not all syntactic categories N, V, A, and P are primitive. Dravidian postpositions are nominal or verbal in origin. A mere 30 Proto-Dravidian roots have been identified as adjectival; the adjectival function is performed by inflected verbs (participles) and nouns. The nominal encoding of experiences (e.g., as fear rather than afraid/afeared) and the absence of the verb have arguably correlate with the appearance of dative case on experiencers. “Possessed” or genitive-marked N may fulfill the adjectival function, as noticed for languages like Ulwa (a less exotic parallel is the English of-possessive construction: circles of light, cloth of gold). More uniquely perhaps, Kannada instantiates dative-marked N as predicative adjectives. A recent argument that Malayalam verbs originate as dative-marked N suggests both that N is the only primitive syntactic category, and the seminal role of the dative case.
Other important aspects of Dravidian morphosyntax to receive attention are anaphors and pronouns (not discussed here; see separate article, anaphora in Dravidian), in particular the long-distance anaphor taan and the verbal reflexive morpheme; question (wh-) words and the question/disjunction morphemes, which combine in a semantically transparent way to form quantifier words like someone; the use of reduplication for distributive quantification; and the occurrence of ‘monstrous agreement’ (first-person agreement in clauses embedded under a speech predicate, triggered by matrix third-person antecedents).
Traditionally, agreement has been considered the finiteness marker in Dravidian. Modals, and a finite form of negation, also serve to mark finiteness. The nonfinite verbal complement to the finite negative may give the negative clause a tense interpretation. Dravidian thus attests matrix nonfinite verbs in finite clauses, challenging the equation of finiteness with tense.
The Dravidian languages are considered wh-in situ languages. However, wh-words in Malayalam appear in a pre-verbal position in the unmarked word order. The apparently rightward movement of some wh-arguments could be explained by assuming a universal VO order, and wh-movement to a preverbal focus phrase. An alternative analysis is that the verb undergoes V-to-C movement.
George van Driem
Several language families and a few language isolates are represented in the Himalayas, the world’s greatest massif, running a length of over 3,600 km. The most well-represented language family in this region happens to be the Trans-Himalayan language family, whose very centre of gravity and phylogenetic diversity is situated within the Eastern Himalaya. This most populous language family on our planet in terms of numbers of speakers used to be known as Tibeto-Burman but, in some circles, the family formerly also went by the names “Indo-Chinese” or “Sino-Tibetan”, the latter two labels actually designating empirically unsupported and now obsolete models of language relationship. The study of Trans-Himalayan historical grammar began with Brian Houghton Hodgson in the 1830s, who during this time served at Kathmandu as the British Resident to the Kingdom of Nepal. Periodically, minor studies devoted attention to several of the more salient morphosyntactic phenomena of Trans-Himalayan historical grammar, but Stuart Wolfenden contributed the first major monograph to the subject in the 1920s. Finally, the historical morphosyntax of the Trans-Himalayan language family came to be the focus of numerous linguistic studies from the 1970s onward, and since that time our understanding of the historical grammar of the language family has changed drastically.
As ever more languages out of the hundreds of previously undocumented Trans-Himalayan tongues came to be described and analysed in great detail, it came to be understood that the flamboyant verbal agreement morphology observed in languages such as the Kiranti languages of eastern Nepal and the rGyalrongic languages of southwestern China were neither grammatically innovative nor represented typological flukes, but instead represented the most grammatically conservative languages within the entire language family. Subsequently, cognate inflectional systems or vestiges of cognate conjugational morphology were discovered in most other branches of the language family as well. The geographical centre, as well as the centre of phylogenetic diversity of the Trans-Himalayan language family, was identified as the highland arc of the Eastern Himalaya. Sinitic languages, although representing by far the most populous single branch of the Trans-Himalayan family, were now understood as constituting just one out of many subgroups, not more divergent from other branches than any one of the four dozen other subgroups making up the language family. The various types of epistemic marking systems observed sporadically throughout the region were shown to be secondary innovations, reflecting a great variety of semantically distinct language-specific grammatical categories. Particularly, languages showing the typology of the Loloish or Sinitic type were shown to be innovative in their grammar, having lost much of the original Trans-Himalayan morphosyntax.
Gregory D. S. Anderson
The Munda language family constitutes the westernmost branch of the widespread Austroasiatic language family. Munda formerly was considered sister to the rest of the phylum, then known as Mon-Khmer, but this has been revised, and Munda is considered as Austroasiatic as any other branch. The internal classification of the Munda languages is still disputed, but a clear North Munda group exists and is uncontroversial. Other higher-order internal divisions remain disputed, although low-level groups like Sora-Gorum or Gutob-Remo are clear and accepted by almost all researchers today.
Phonologically speaking, Munda languages make extensive use of glottal stop and pre-glottalized stops, nasal vowels, and retroflexion. Word level prosody shows Austroasiatic features with an overlay of South Asian areal features on the phrase level. Register and tone have been reported for individual languages such as creaky voice in Gorum and a low tone in Korku.
Nouns in Munda languages may encode a range of grammatical and local cases, person and number of possessors, and covert distinctions of animacy in agreement and other morphosyntactic features. Verbs in Munda languages can be quite complex, with subject and object as well as TAM encoding, transitivity, finiteness, etc. Kherwarian languages stand out in this regard as well as for the distributional facts of the subject clitics, where the preferred locus is enclitic to the word immediately preceding the verb. Systems of negation can be very complicated and show unexpected interactions with TAM marking in languages like Gutob.
Syntactically, Munda languages show many typical South Asian features such as verb-final structure, as well as non-finite structures, and in some cases switch reference systems or noun incorporation.
The current sociolinguistic and demographic contexts of the different Munda languages range from expanding and healthy with official status in the case of Santali to seriously endangered in the case of Gorum.
Jack B. Martin
The noun-modifying clause construction (NMCC) in Japanese is a complex noun phrase in which a prenominal clause is dependent on the head noun. Naturally occurring instances of the construction demonstrate that a single structure, schematized as [[… predicate (finite/adnominal)] Noun], represents a wide range of semantic relations between the head noun and the dependent clause, encompassing some that would be expressed by structurally distinct constructions such as relative clauses, noun complement clauses, and other types of complex noun phrases in other languages, such as English. In that way, the Japanese NMCC demonstrates a clear case of the general noun-modifying construction (GNMCC), that is, an NMCC that has structural uniformity across interpretations that extend beyond the range of relative clauses.
One of the notable properties of the Japanese NMCC is that the modifying clause may consist only of the predicate, reflecting the fact that referential density is moderate in Japanese—arguments of a predicate are not required to be overtly expressed either in the main clause or in the modifying clause. Another property of the Japanese NMCC is that there is no explicit marking in the construction that indicates the grammatical or semantic relation between the head noun and the modifying clause. The two major constituents are simply juxtaposed to each other.
Successful construal of the intended interpretations of instances of such a construction, in the absence of explicit markings, likely relies on an aggregate of structural, semantic, and pragmatic factors, including the semantic content of the linguistic elements, verb valence information, and the interpreter’s real-world knowledge, in addition to the basic structural information.
Researchers with different theoretical approaches have studied Japanese NMCCs or subsets thereof. Syntactic approaches, inspired by generative grammar, have focused mostly on relative clauses and aimed to identify universally recognized syntactic principles. Studies that take the descriptive approach have focused on detailed descriptions and the classification of a wide spectrum of naturally occurring instances of the construction in Japanese. The third and most recent group of studies has emphasized the importance of semantics and pragmatics in accounting for a wide variety of naturally occurring instances.
The examination of Japanese NMCCs provides information about the nature of clausal noun modification and affords insights into languages beyond Japanese, as similar phenomena have reportedly been observed crosslinguistically to varying degrees.
The Northeast Asia is one of the unique points on the globe where there are many language isolates and portmanteau families. From a conservative point of view, the Japanese language is a member of such a portmanteau family that has recently and increasingly been called Japonic in the Western literature. While Japanese is unquestionably a member of this Japonic language family, which consists of two Japanese languages (Japanese itself and the moribund Hachijō language) and four or five relatively closely related Ryūkyūan languages (Amami, Okinawan, Miyako, Yaeyama, and possibly Yonaguni), attempts have also been made to establish a genetic relationship between Japanese and various other language families. Most of these attempts have been amateurish, a major exception being the Koreo-Japonic hypothesis, which still remains unproven as well. It is also quite likely that the Japonic language family (or, more precisely, Insular Japonic) is the only linguistic grouping whose genetic relationship can be established beyond any doubt. A genetic relationship is also likely to exist between Japonic and a number of fragmentarily attested languages that once flourished in the south and center of the Korean Peninsula, but that died out no later than 9th century A.D. The paucity of material available does not allow one to establish solid predictive-productive regular correspondences in many cases, but intuitively the genetic relationship seems to be a matter of fact. Anything beyond intuition, however, lies in the realm of conjecture and speculation. The alleged Koreo-Japonic relationship is best explained by a centuries-long contact relationship rather than by common origin, given such factors as the virtual absence of any kind of shared paradigmatic morphology, as well as by multiple problems in establishing the real (and not imaginable or made-to-fit) regular correspondences. The Japanese-“Altaic” hypothesis is even more speculative and far-fetched. Consequently, the conclusion is that the Japanese language or the Japonic language family has no demonstrable relationship with any other language family or language isolate on the planet.