Gregory D. S. Anderson
The Munda language family constitutes the westernmost branch of the widespread Austroasiatic language family. Munda formerly was considered sister to the rest of the phylum, then known as Mon-Khmer, but this has been revised, and Munda is considered as Austroasiatic as any other branch. The internal classification of the Munda languages is still disputed, but a clear North Munda group exists and is uncontroversial. Other higher-order internal divisions remain disputed, although low-level groups like Sora-Gorum or Gutob-Remo are clear and accepted by almost all researchers today.
Phonologically speaking, Munda languages make extensive use of glottal stop and pre-glottalized stops, nasal vowels, and retroflexion. Word level prosody shows Austroasiatic features with an overlay of South Asian areal features on the phrase level. Register and tone have been reported for individual languages such as creaky voice in Gorum and a low tone in Korku.
Nouns in Munda languages may encode a range of grammatical and local cases, person and number of possessors, and covert distinctions of animacy in agreement and other morphosyntactic features. Verbs in Munda languages can be quite complex, with subject and object as well as TAM encoding, transitivity, finiteness, etc. Kherwarian languages stand out in this regard as well as for the distributional facts of the subject clitics, where the preferred locus is enclitic to the word immediately preceding the verb. Systems of negation can be very complicated and show unexpected interactions with TAM marking in languages like Gutob.
Syntactically, Munda languages show many typical South Asian features such as verb-final structure, as well as non-finite structures, and in some cases switch reference systems or noun incorporation.
The current sociolinguistic and demographic contexts of the different Munda languages range from expanding and healthy with official status in the case of Santali to seriously endangered in the case of Gorum.
Languages from at least five genetically unrelated families are spoken in the Caucasus, but there are only three endemic linguistic families belonging to the region: Kartvelian, West Caucasian, and Northeast Caucasian. These families are rather heterogeneous in terms of the number of languages and the distribution of the speakers across them. The Caucasus represents a situation where languages with millions of speakers have coexisted with one-village languages for hundreds of years, and where multilingualism has always been the norm. The richness of Caucasian languages on every linguistic stratum is dazzling: here we find some of the largest consonant inventories, inflectional systems where the mere number of word forms strains credibility (one of the Caucasian languages, Archi, is claimed to have over a million and a half word forms), and challenging syntactic structures. The typological interest of the Caucasian languages and the challenges they present to linguistic theory lie in different areas. Thus, for Kartvelian languages, the number of factors at play in the verbal system make the task of the production of a correct verbal form far from trivial. West Caucasian languages represent an instance of polysynthetic polypersonal verb inflection, which is unusual not only for Caucasus but for Eurasia in general. East Caucasian languages have large systems of non-finite forms which, unusually, retain the ability to realize agreement in gender and number while their non-finite nature is determined by the inability to head an independent clause and to express certain morpho-syntactic categories such as illocutionary force and evidentiality. Finally, all Caucasian languages are ergative to some extent.
Hokan is a linguistic stock or phylum based on a series of hypotheses about deeper genetic relationships among languages that extend geographically from Northern California to Nicaragua. Following the general effort to genetically link the vast number of Native American languages and to reduce them to a few superstocks, Dixon and Kroeber first proposed the Hokan stock in 1913, to include several California indigenous languages: Karuk, Chimariko, Shastan, Palaihnihan (Atsugewi and Achumawi), Pomoan, Yana, and later Esselen and Yuman. The name Hokan stems from the Atsugewi word for “two”: hoqi. While the first proposals by Dixon and Kroeber rested on very limited cognate sets comprising only five words, later assessments by Sapir included hundreds of putative cognate sets and analyses of Hokan morphosyntax. By 1925, Sapir further included Washo, Salinan, Seri, Chumashan, Tequistlatecan, and Subtiaba-Tlapanec as the Southern Hokan branch into the stock.
Throughout the 20th century, scholars sought additional evidence for the stock as more and refined data on the languages became available. A number of languages were added, and earlier proposals were abandoned. A new surge in work on individual California indigenous languages in the 1950s and 1960s prompted a string of studies conducting binary comparisons. This renewed interest inspired a series of Hokan conferences held until the 1990s. A more recent comprehensive assessment of the entire stock was undertaken by Kaufman in 1988. Applying rigorous analysis and only implicating those languages for which he encountered substantial evidence, Kaufman proposes sixteen classificatory units for Hokan clustered geographically. Kaufman’s Hokan stock also includes Coahuilteco and Comecrudan in Mexico and Jicaque in Nicaragua.
Although Hokan was widely studied in the 20th century, and many scholars presented what they thought to be supporting evidence, it is far from being an established genetic unit. In fact, many scholars today treat it with a lot of skepticism. One major challenge, as with any phylum-level affiliation, is its time depth. Proto-Hokan is thought to be at least as antique as Proto-Indo-European. Moreover, many of the languages were spoken in geographically contiguous areas, with speakers being multilingual and in close contact for an extended period of time, as is the case in Northern California. This suggests considerable language contact effects and complicates the distinction between true cognates and ancient borrowings. Many of the languages involved further show similarities in grammatical structure as a result of language contact.
Hokan languages stretch across California, Nevada, South Texas, various parts of Mexico, Honduras, and Nicaragua and display notable structural differences. Phonologically, the languages show great variation including small and large phoneme inventories and different phonological processes. Typologically, they are equally diverse, but many are considered polysynthetic to varying degrees. Morphosyntactic and grammatical similarities are evident especially among languages spoken in Northern California. These resemblances include sets of lexical affixes with similar meanings and affinities in core argument patterns.
D. Gary Miller
Apart from runic inscriptions, Gothic is the earliest attested language of the Germanic family, dating to the 4th century. Along with Crimean Gothic, it belongs to the branch known as East Germanic. The bulk of the extant Gothic corpus is a translation of the Bible, of which only a portion remains. The translation is traditionally ascribed to Wulfila, who is credited with inventing the Gothic alphabet. The many Greek conventions both help and hinder interpretation of the Gothic phonological system. As in Greek, letters of the alphabet functioned as numerals, but the late letter names were from runic.
Gothic inflectional categories include nouns, adjectives, and verbs. Nouns are inflected for three genders, two numbers, and four cases. Various stem types inherited from Indo-European constitute different form classes in Gothic. Adjectives have the same properties and are also inflected according to so-called weak and strong forms, as are Gothic verbs. Verbs are inflected for three persons and numbers, an indicative and a nonindicative mood (here called “optative”), past and nonpast tense, and voice. The mediopassive survives in Gothic morphologically as a synthetic passive and syntactically in innovated periphrastic formations; middle and anticausative functions were taken over by reflexive-type structures. Nonfinite forms are the infinitive, the imperative, and two participles.
In syntax, Gothic had null subjects as an option, mostly in the third person singular. Aspect was effected primarily by prefixes, which have many other functions, and aspect is not consistently indicated. Absolute constructions with a participle occurred in various cases with functional differences. Relativization was effected primarily by relative pronouns built on demonstratives plus a complementizer. Complementizers could be used with subordinate clause verbs in the indicative or optative. The switch to the optative was triggered by irrealis, matrix verbs that do not permit a full range of subordinate tenses, expression of a hope or wish, potentiality, and several other conditions. Many of these are also relevant to matrix clauses (independent optatives).
Essentials of linearization include prepositional phrases, default postposed genitives and possessive adjectives, and preposed demonstratives. Verb-object order predominates, but there is much Greek influence. Verb-auxiliary order is native Gothic.
Cynthia L. Allen
This is an advance summary of a forthcoming article in the Oxford Research Encyclopedia of Linguistics. Please check back later for the full article.
Middle English is the name given to the English of the period from approximately 1100 to approximately 1450. This period is marked by substantial developments in all areas of English grammar. It is also the period of English when different dialects are the most fully attested in the texts. At the beginning of the Middle English period, the sociolinguistic status of English was low due to the Norman Invasion, and although religious texts of Old English composition continued to be copied and updated, few original compositions are extant. By the end of the period, English had regained its status as the language of government, law, and literature generally.
Although some notable changes to the phonemic inventory of consonants date from the Middle English period, the most dramatic phonological developments of the period involve vowels. The reduction of the vowels of unstressed syllables, one of the changes that marks the beginning of the Middle English period, is a phonological change with substantial morphological effects, as it substantially reduced the number of distinctive inflectional forms. Constituent order replaced case marking as the primary means of signaling grammatical relations. By the end of the Middle English period, subject-verb-object order had become established as the norm.
The lexicon of English was transformed in this period by an enormous influx of French words. The role of derivational morphology declined as its functions were to some extent replaced by the adoption of French words. Most Scandinavian loans in English first appear in the texts of this period. The Scandinavian loans are typically everyday words, while the words adopted from French are more often in areas of government, law, and higher culture, reflecting the nature of the contact between English speakers and the speakers of these languages.
The density of the Scandinavian population in the northern part of England is generally held to be responsible for the earlier appearance of changes in the north than in the south. The replacement of the third person plural personal pronoun hie by the Scandinavian they is an example of a development which is apparent only in the north early in Middle English but became general in English by the end of this period.
An important phonological development of later Middle English is the beginning of the Great Vowel Shift, which affected long vowels and involved successive changes and was implemented differently in different dialects, the north-south divide being the most evident.
Early Middle English is a language that cannot be understood by Modern English readers without special study, while the language of the late Middle English period, especially that coming from the London area, can be understood with the heavy use of explanatory notes.
The word accent system of Tokyo Japanese might look quite complex with a number of accent patterns and rules. However, recent research has shown that it is not as complex as has been assumed if one incorporates the notion of markedness into the analysis: nouns have only two productive accent patterns, the antepenultimate and the unaccented pattern, and different accent rules can be generalized if one focuses on these two productive accent patterns.
The word accent system raises some new interesting issues. One of them concerns the fact that a majority of nouns are ‘unaccented,’ that is, they are pronounced with a rather flat pitch pattern, apparently violating the principle of obligatoriness. A careful analysis of noun accentuation reveals that this strange accent pattern occurs in some linguistically predictable structures. In morphologically simplex nouns, it typically tends to emerge in four-mora nouns ending in a sequence of light syllables. In compound nouns, on the other hand, it emerges due to multiple factors, such as compound-final deaccenting morphemes, deaccenting pseudo-morphemes, and some types of prosodic configurations.
Japanese pitch accent exhibits an interesting aspect in its interactions with other phonological and linguistic structures. For example, the accent of compound nouns is closely related with rendaku, or sequential voicing; the choice between the accented and unaccented patterns in certain types of compound nouns correlates with the presence or absence of the sequential voicing. Moreover, whether the compound accent rule applies to a certain compound depends on its internal morphosyntactic configuration as well as its meaning; alternatively, the compound accent rule is blocked in certain types of morphosyntactic and semantic structures.
Finally, careful analysis of word accent sheds new light on the syllable structure of the language, notably on two interrelated questions about diphthong-hood and super-heavy syllables. It provides crucial insight into ‘diphthongs,’ or the question of which vowel sequence constitutes a diphthong, against a vowel sequence across a syllable boundary. It also presents new evidence against trimoraic syllables in the language.
Timothy J. Vance
The term rendaku, sometimes translated as sequential voicing, denotes a morphophonemic phenomenon in Japanese. In a prototypical case, an alternating morpheme appears with an initial voiceless obstruent as a word on its own or as the initial element (E1) in a compound but with an initial voiced obstruent as the second element (E2) in a two-element compound. For example, the simplex word /take/ ‘bamboo’ and the compound /take+yabu/ ‘bamboo grove’ (cf. /yabu/ ‘grove’) begin with voiceless /t/, but this morpheme meaning ‘bamboo’ begins with voiced /d/ in /sao+dake/ ‘bamboo (made into a) pole’ (cf. /sao/ ‘pole’). Rendaku was already firmly established in 8th-century Old Japanese (OJ), the earliest variety for which extensive written records exist, and subsequent sound changes have made the alternations phonetically heterogeneous. Many OJ compounds with eligible E2s did not undergo rendaku, and the phenomenon remains pervasively irregular in modern Japanese. There are, however, many factors that promote or inhibit rendaku, and some of these appear to influence native-speaker behavior on experimental tasks. The best known phonological factor is Lyman’s Law, according to which rendaku does not apply to E2s that contain a non-initial voiced obstruent. Many theoretical phonologists endorse the idea that Lyman’s Law is a sub-case of the Obligatory Contour Principle, which rules out identical or similar units if they would be adjacent in some domain. Other well-known factors involve vocabulary stratum (e.g., the resistance to rendaku of recently borrowed E2s) or the morphological/semantic relationship between E2 and E1 (e.g., the resistance to rendaku of coordinate compounds). Some morphemes are idiosyncratically immune to rendaku. Other morphemes alternate but undergo rendaku in some compounds while failing to undergo it in others, even though no known factor is relevant. In addition, many individual compounds vary between a form with rendaku and a form without, and this variability is often not reflected in dictionary entries. Despite its irregularity, rendaku is productive in the sense that it often applies to newly created compounds. Many compounds, of course, are stored (with or without rendaku) in a speaker’s lexicon, but fact that native speakers can apply rendaku not just to existing E2s in novel compounds but even to made-up E2s shows that rendaku as an active process is somehow incorporated into the grammar.
Japanese is a language where the grammatical status of arguments and adjuncts is marked exclusively by postnominal case markers, and various argument realization patterns can be assessed by their case marking. Since Japanese is categorized as a language of the nominative-accusative type typologically, the unmarked case-marking frame obtained for transitive predicates of the non-stative (or eventive) type is ‘nominative-accusative’. Nevertheless, transitive predicates falling into the stative class often have other case-marking alignments, such as ‘nominative-nominative’ and ‘dative-nominative’. Consequently, Japanese provides much more varying argument realization patterns than those expected from its typological character as a nominative-accusative language.
In point of fact, argument marking can actually be much more elastic and variable, the variations being motivated by several linguistic factors. Arguments often have the option of receiving either syntactic or semantic case, with no difference in the logical or cognitive meaning (as in plural agent and source agent alternations) or depending on the meanings their predicate carry (as in locative alternation). The type of case marking that is not normally available in main clauses can sometimes be obtained in embedded contexts (i.e., in exceptional case marking and small-clause constructions). In complex predicates, including causative and indirect passive predicates, arguments are case-marked differently from their base clauses by virtue of suffixation, and their case patterns follow the mono-clausal case array, despite the fact that they have multi-clausal structures.
Various case marking options are also made available for arguments by grammatical operations. Some processes instantiate a change on the grammatical relations and case marking of arguments with no affixation or embedding. Japanese has the grammatical process of subjectivization, creating extra (non-thematic) major subjects, many of which are identified as instances of ‘possessor raising’ (or argument ascension). There is another type of grammatical process, which reduces the number of arguments by virtue of incorporating a noun into the predicate, as found in the light verb constructions with suru ‘do’ and the complex adjective constructions formed on the negative adjective nai ‘non-existent.’
K. A. Jayaseelan
The Dravidian languages have a long-distance reflexive anaphor taan. (It is taan in Tamil and Malayalam, taanu in Kannada and tanu in Telugu.) As is the case with other long-distance anaphors, it is subject-oriented; it is also [+human] and third person. Interestingly, it is infelicitous if bound within the minimal clause when it is an argument of the verb. (That is, it seems to obey Principle B of the binding theory.) Although it is subject-oriented in the normal case, it can be bound by a non-subject if the verb is a “psych predicate,” that is, a predicate that denotes a feeling; in this case, it can be bound by the experiencer of the feeling. Again, in a discourse that depicts the thoughts, feelings, or point of view of a protagonist—the so-called “logophoric contexts”—it can be coreferential with the protagonist even if the latter is mentioned only in the preceding discourse (not within the sentence). These latter facts suggest that the anaphor is in fact coindexed with the perspective of the clause (rather than with the subject per se). In cases where this anaphor needs to be coindexed with the minimal subject (to express a meaning like ‘John loves himself’), the Dravidian languages exhibit two strategies to circumvent the Principle B effect. Malayalam adds an emphasis marker tanne to the anaphor; taan tanne can corefer with the minimal subject. This strategy parallels the strategy of European languages and East Asian languages (cf. Scandinavian seg selv). The three other major Dravidian languages—Tamil, Telugu, and Kannada—use a verbal reflexive: they add a light verb koL- (lit. ‘take’) to the verbal complex, which has the effect of reflexivizing the transitive predicate. (It either makes the verb intransitive or gives it a self-benefactive meaning.)
The Dravidian languages also have reciprocal and distributive anaphors. These have bipartite structures. An example of a Malayalam reciprocal anaphor is oral … matte aaL (‘one person … other person’). The distributive anaphor in Malayalam has the form awar-awar (‘they-they’); it is a reduplicated pronoun. The reciprocals and distributives are strict anaphors in the sense that they apparently obey Principle A; they must be bound in the domain of the minimal subject. They are not subject-oriented.
A noteworthy fact about the pronominal system of Dravidian is that the third person pronouns come in proximal-distal pairs, the proximal pronoun being used to refer to something nearby and the distal pronoun being used elsewhere.
Victor A. Friedman
The Balkan languages were the first group of languages whose similarities were explained in modern linguistic terms as a result of language contact rather than as a result of descent from a common ancestor. Nikolai Trubetzkoy coined the term Sprachbund ‘linguistic league’ (as opposed to Sprachfamilie ‘language family’) to describe this relationship. Balkan linguistics, as both a subset of and precursor to contact linguistics, is, at its base, an historical linguistic discipline. It seeks to explain similarities among the relevant languages as the result of diffusion rather than of either transmission or of putative universal, typological properties of human language (which latter assumes parallel developments whose causation is ahistorical, i.e., unconnected with either contact or ancestry). The relevant languages are, with the exception of Turkic, all part of the Indo-European language family, but they belong to five distinct groups that are known to have been separated for a significant length of time (presumably millennia). Moreover, for four out of five Indo-European groups as well as for Turkic, there exists documentation that goes back more than a millennium, and in some cases several millennia. The Balkan languages are thus the oldest example of a well-documented and still living Sprachbund.
The primary questions that Balkan linguistics seeks to answer are these: What are the results of language contact in the Balkan languages, and how did they come about? The Balkan languages are traditionally defined as Albanian, Modern Greek, Balkan Romance (Romanian, Aromanian, and Meglenoromanian), and Balkan Slavic (Bulgarian, Macedonian, and the southernmost dialects of the former Serbo-Croatian). In recent decades, it has been recognized that the relevant dialects of Romani, Judezmo, and Turkish and Gagauz also participate in at least some of the convergent processes that are taken as definitive of the Balkan linguistic league. While the language family is defined by regular sound correspondences, which in turn help define shared morphology and a core lexicon, the Balkan linguistic league is defined principally by shared morphosyntactic developments and a shared lexicon of borrowings often called “cultural.” In the Balkan linguistic league, phonological developments are sometimes shared among different languages at the dialectal level, but there are no such features that characterize the Balkan languages as a group. Just as in the language family not every diagnostic item is represented in every branch, so, too, in the Balkan linguistic league not every feature is equally represented in all languages and dialects.
Among the most characteristic morphosyntactic features are the following: (1) replacement of infinitives by analytic subjunctives, (2) the use of a particle derived from etymological ‘want’ to mark the future, (3) replacement of synthetic gradation of adjectives with analytic constructions, (4) replacement of conditionals by anterior futures, (5) resumptive clitic pronouns for certain direct and indirect objects, (6) various simplifications in the declensional system, (7) postposed definite articles (for Balkan Slavic, Balkan Romance, and Albanian), (8) grammaticalized evidentials (Balkan Slavic, Albanian, Turkic, and to some extent Balkan Romance and Romani). While some of these convergences began in the ancient or medieval periods, the Balkan linguistic league took its definitive modern shape during the centuries of the Ottoman Empire (14th to early 20th centuries).