This is an advance summary of a forthcoming article in the Oxford Research Encyclopedia of Linguistics. Please check back later for the full article.
Kra–Dai, also known as Tai–Kadai, Daic, and Kadai, are a family of highly diverse languages found in southern China, northeast India, and Southeast Asia. The number of these languages is estimated to be close to a hundred, with approximately 100 million speakers all over the world. As the name itself suggests, Kra-Dai is made up of two major groups, Kra and Dai. The former refers to a number of lesser-known languages, some of which have only a few hundred fluent speakers or even less. The latter (also known as Tai, or Kam-Tai) is well established, and comprises the best known members of the family: Thai and Lao, the national languages of Thailand and Laos, whose speakers account for over half of Kra-Dai population.
The ultimate genetic affiliation of Kra-Dai remains controversial, although a consensus among Western scholars holds that it belongs under Austronesian. The majority of Kra-Dai languages have no writing systems of their own, particularly Kra. Languages with writing systems include Thai, Lao, Sipsongpanna Dai, and Tai Lue. These use the Indic-based scripts. Others use Chinese character-based scripts, such as the Zhuang and Kam-Sui in southern China and surrounding regions. Romanized scripts were also introduced in the 1950s, by the government for the Zhuang and the Kam-Sui languages. Almost each group within Kra-Dai has a rich oral history tradition.
The languages are typically tonal, isolating, and analytic, lacking in inflectional morphology, with no distinction for number and gender. A significant number of basic vocabulary items are mono-syllabic, but bi-syllabic and multi-syllabic compounds also abound. There are morphological processes in which etymologically related words manifest themselves in groups through tonal, initial, or vowel alternations. Reduplication is a salient word formation mechanism. In syntax, the Kra-Dai languages can be said to have basic SVO word order. They possess a rich system of noun classifiers. Other features include verb serialization without overt marking to indicate grammatical relations. A number of lexical items (mostly verbs) may function as grammatical morphemes in syntactic operations. Temporal and aspectual meanings are expressed through tense-aspect markers typically derived from verbs, while mood and modality are conveyed via a rich array of discourse particles.
“Altaic” is a common term applied by linguists to a number of language families, spread across Central Asia and the Far East and sharing a large, most likely non-coincidental, number of structural and morphemic similarities. At the onset of Altaic studies, these similarities were ascribed to the one-time existence of an ancestral language—“Proto-Altaic,” from which all these families are descended; circumstantial evidence and glottochronological calculations tentatively date this language to some time around the 6th–7th millennium
The debate over the nature of the relationship between the various units that constitute “Altaic,” sometimes referred to as “the Altaic controversy,” has been one of the most hotly debated topics in 20th-century historical linguistics and a major focal point of studies dealing with the prehistory of Central and East Eurasia. Supporters of “Proto-Altaic,” commonly known as “(pro-)Altaicists,” claim that only divergence from an original common ancestor can account for the observed regular phonetic correspondences and other structural similarities, whereas “anti-Altaicists,” without denying the existence of such similarities, insist that they do not belong to the “core” layers of the respective languages and are therefore better explained as results of lexical borrowing and other forms of areal linguistic contact.
As a rule, “pro-Altaicists” claim that “Proto-Altaic” is as reconstructible by means of the classic comparative method as any uncontroversial linguistic family; in support of this view, they have produced several attempts to assemble large bodies of etymological evidence for the hypothesis, backed by systems of regular phonetic correspondences between compared languages. All of these, however, have been heavily criticized by “anti-Altaicists” for lack of methodological rigor, implausibility of proposed phonetic and/or semantic changes, and confusion of recent borrowings with items allegedly inherited from a common ancestor. Despite the validity of many of these objections, it remains unclear whether they are sufficient to completely discredit the hypothesis of a genetic connection between the various branches of “Altaic,” which continues to be actively supported by a small, but stable scholarly minority.
George van Driem
Several language families and a few language isolates are represented in the Himalayas, the world’s greatest massif, running a length of over 3,600 km. The most well-represented language family in this region happens to be the Trans-Himalayan language family, whose very centre of gravity and phylogenetic diversity is situated within the Eastern Himalaya. This most populous language family on our planet in terms of numbers of speakers used to be known as Tibeto-Burman but, in some circles, the family formerly also went by the names “Indo-Chinese” or “Sino-Tibetan”, the latter two labels actually designating empirically unsupported and now obsolete models of language relationship. The study of Trans-Himalayan historical grammar began with Brian Houghton Hodgson in the 1830s, who during this time served at Kathmandu as the British Resident to the Kingdom of Nepal. Periodically, minor studies devoted attention to several of the more salient morphosyntactic phenomena of Trans-Himalayan historical grammar, but Stuart Wolfenden contributed the first major monograph to the subject in the 1920s. Finally, the historical morphosyntax of the Trans-Himalayan language family came to be the focus of numerous linguistic studies from the 1970s onward, and since that time our understanding of the historical grammar of the language family has changed drastically.
As ever more languages out of the hundreds of previously undocumented Trans-Himalayan tongues came to be described and analysed in great detail, it came to be understood that the flamboyant verbal agreement morphology observed in languages such as the Kiranti languages of eastern Nepal and the rGyalrongic languages of southwestern China were neither grammatically innovative nor represented typological flukes, but instead represented the most grammatically conservative languages within the entire language family. Subsequently, cognate inflectional systems or vestiges of cognate conjugational morphology were discovered in most other branches of the language family as well. The geographical centre, as well as the centre of phylogenetic diversity of the Trans-Himalayan language family, was identified as the highland arc of the Eastern Himalaya. Sinitic languages, although representing by far the most populous single branch of the Trans-Himalayan family, were now understood as constituting just one out of many subgroups, not more divergent from other branches than any one of the four dozen other subgroups making up the language family. The various types of epistemic marking systems observed sporadically throughout the region were shown to be secondary innovations, reflecting a great variety of semantically distinct language-specific grammatical categories. Particularly, languages showing the typology of the Loloish or Sinitic type were shown to be innovative in their grammar, having lost much of the original Trans-Himalayan morphosyntax.
The Northeast Asia is one of the unique points on the globe where there are many language isolates and portmanteau families. From a conservative point of view, the Japanese language is a member of such a portmanteau family that has recently and increasingly been called Japonic in the Western literature. While Japanese is unquestionably a member of this Japonic language family, which consists of two Japanese languages (Japanese itself and the moribund Hachijō language) and four or five relatively closely related Ryūkyūan languages (Amami, Okinawan, Miyako, Yaeyama, and possibly Yonaguni), attempts have also been made to establish a genetic relationship between Japanese and various other language families. Most of these attempts have been amateurish, a major exception being the Koreo-Japonic hypothesis, which still remains unproven as well. It is also quite likely that the Japonic language family (or, more precisely, Insular Japonic) is the only linguistic grouping whose genetic relationship can be established beyond any doubt. A genetic relationship is also likely to exist between Japonic and a number of fragmentarily attested languages that once flourished in the south and center of the Korean Peninsula, but that died out no later than 9th century A.D. The paucity of material available does not allow one to establish solid predictive-productive regular correspondences in many cases, but intuitively the genetic relationship seems to be a matter of fact. Anything beyond intuition, however, lies in the realm of conjecture and speculation. The alleged Koreo-Japonic relationship is best explained by a centuries-long contact relationship rather than by common origin, given such factors as the virtual absence of any kind of shared paradigmatic morphology, as well as by multiple problems in establishing the real (and not imaginable or made-to-fit) regular correspondences. The Japanese-“Altaic” hypothesis is even more speculative and far-fetched. Consequently, the conclusion is that the Japanese language or the Japonic language family has no demonstrable relationship with any other language family or language isolate on the planet.
Timothy J. Vance
The term rendaku, sometimes translated as sequential voicing, denotes a morphophonemic phenomenon in Japanese. In a prototypical case, an alternating morpheme appears with an initial voiceless obstruent as a word on its own or as the initial element (E1) in a compound but with an initial voiced obstruent as the second element (E2) in a two-element compound. For example, the simplex word /take/ ‘bamboo’ and the compound /take+yabu/ ‘bamboo grove’ (cf. /yabu/ ‘grove’) begin with voiceless /t/, but this morpheme meaning ‘bamboo’ begins with voiced /d/ in /sao+dake/ ‘bamboo (made into a) pole’ (cf. /sao/ ‘pole’). Rendaku was already firmly established in 8th-century Old Japanese (OJ), the earliest variety for which extensive written records exist, and subsequent sound changes have made the alternations phonetically heterogeneous. Many OJ compounds with eligible E2s did not undergo rendaku, and the phenomenon remains pervasively irregular in modern Japanese. There are, however, many factors that promote or inhibit rendaku, and some of these appear to influence native-speaker behavior on experimental tasks. The best known phonological factor is Lyman’s Law, according to which rendaku does not apply to E2s that contain a non-initial voiced obstruent. Many theoretical phonologists endorse the idea that Lyman’s Law is a sub-case of the Obligatory Contour Principle, which rules out identical or similar units if they would be adjacent in some domain. Other well-known factors involve vocabulary stratum (e.g., the resistance to rendaku of recently borrowed E2s) or the morphological/semantic relationship between E2 and E1 (e.g., the resistance to rendaku of coordinate compounds). Some morphemes are idiosyncratically immune to rendaku. Other morphemes alternate but undergo rendaku in some compounds while failing to undergo it in others, even though no known factor is relevant. In addition, many individual compounds vary between a form with rendaku and a form without, and this variability is often not reflected in dictionary entries. Despite its irregularity, rendaku is productive in the sense that it often applies to newly created compounds. Many compounds, of course, are stored (with or without rendaku) in a speaker’s lexicon, but fact that native speakers can apply rendaku not just to existing E2s in novel compounds but even to made-up E2s shows that rendaku as an active process is somehow incorporated into the grammar.
Hokan is a linguistic stock or phylum based on a series of hypotheses about deeper genetic relationships among languages that extend geographically from Northern California to Nicaragua. Following the general effort to genetically link the vast number of Native American languages and to reduce them to a few superstocks, Dixon and Kroeber first proposed the Hokan stock in 1913, to include several California indigenous languages: Karuk, Chimariko, Shastan, Palaihnihan (Atsugewi and Achumawi), Pomoan, Yana, and later Esselen and Yuman. The name Hokan stems from the Atsugewi word for “two”: hoqi. While the first proposals by Dixon and Kroeber rested on very limited cognate sets comprising only five words, later assessments by Sapir included hundreds of putative cognate sets and analyses of Hokan morphosyntax. By 1925, Sapir further included Washo, Salinan, Seri, Chumashan, Tequistlatecan, and Subtiaba-Tlapanec as the Southern Hokan branch into the stock.
Throughout the 20th century, scholars sought additional evidence for the stock as more and refined data on the languages became available. A number of languages were added, and earlier proposals were abandoned. A new surge in work on individual California indigenous languages in the 1950s and 1960s prompted a string of studies conducting binary comparisons. This renewed interest inspired a series of Hokan conferences held until the 1990s. A more recent comprehensive assessment of the entire stock was undertaken by Kaufman in 1988. Applying rigorous analysis and only implicating those languages for which he encountered substantial evidence, Kaufman proposes sixteen classificatory units for Hokan clustered geographically. Kaufman’s Hokan stock also includes Coahuilteco and Comecrudan in Mexico and Jicaque in Nicaragua.
Although Hokan was widely studied in the 20th century, and many scholars presented what they thought to be supporting evidence, it is far from being an established genetic unit. In fact, many scholars today treat it with a lot of skepticism. One major challenge, as with any phylum-level affiliation, is its time depth. Proto-Hokan is thought to be at least as antique as Proto-Indo-European. Moreover, many of the languages were spoken in geographically contiguous areas, with speakers being multilingual and in close contact for an extended period of time, as is the case in Northern California. This suggests considerable language contact effects and complicates the distinction between true cognates and ancient borrowings. Many of the languages involved further show similarities in grammatical structure as a result of language contact.
Hokan languages stretch across California, Nevada, South Texas, various parts of Mexico, Honduras, and Nicaragua and display notable structural differences. Phonologically, the languages show great variation including small and large phoneme inventories and different phonological processes. Typologically, they are equally diverse, but many are considered polysynthetic to varying degrees. Morphosyntactic and grammatical similarities are evident especially among languages spoken in Northern California. These resemblances include sets of lexical affixes with similar meanings and affinities in core argument patterns.
Cynthia L. Allen
Middle English is the name given to the English of the period from approximately 1100 to approximately 1450. This period is marked by substantial developments in all areas of English grammar. It is also the period of English when different dialects are the most fully attested in the texts. At the beginning of the Middle English period, the sociolinguistic status of English was low due to the Norman Invasion, and although religious texts of Old English composition continued to be copied and updated, few original compositions are extant. By the end of the period, English had regained its status as the language of government, law, and literature generally.
Although some notable changes to the phonemic inventory of consonants date from the Middle English period, the most dramatic phonological developments of the period involve vowels. The reduction of the vowels of unstressed syllables, one of the changes that marks the beginning of the Middle English period, is a phonological change with substantial morphological effects, as it substantially reduced the number of distinctive inflectional forms. Constituent order replaced case marking as the primary means of signaling grammatical relations. By the end of the Middle English period, subject-verb-object order had become established as the norm.
The lexicon of English was transformed in this period by an enormous influx of French words. The role of derivational morphology declined as its functions were to some extent replaced by the adoption of French words. Most Scandinavian loans in English first appear in the texts of this period. The Scandinavian loans are typically everyday words, while the words adopted from French are more often in areas of government, law, and higher culture, reflecting the nature of the contact between English speakers and the speakers of these languages.
The density of the Scandinavian population in the northern part of England is generally held to be responsible for the earlier appearance of changes in the north than in the south. The replacement of the third person plural personal pronoun hie by the Scandinavian they is an example of a development which is apparent only in the north early in Middle English but became general in English by the end of this period.
An important phonological development of later Middle English is the beginning of the Great Vowel Shift, which affected long vowels and involved successive changes and was implemented differently in different dialects, the north-south divide being the most evident.
Early Middle English is a language that cannot be understood by Modern English readers without special study, while the language of the late Middle English period, especially that coming from the London area, can be understood with the heavy use of explanatory notes.
Languages from at least five genetically unrelated families are spoken in the Caucasus, but there are only three endemic linguistic families belonging to the region: Kartvelian, West Caucasian, and Northeast Caucasian. These families are rather heterogeneous in terms of the number of languages and the distribution of the speakers across them. The Caucasus represents a situation where languages with millions of speakers have coexisted with one-village languages for hundreds of years, and where multilingualism has always been the norm. The richness of Caucasian languages on every linguistic stratum is dazzling: here we find some of the largest consonant inventories, inflectional systems where the mere number of word forms strains credibility (one of the Caucasian languages, Archi, is claimed to have over a million and a half word forms), and challenging syntactic structures. The typological interest of the Caucasian languages and the challenges they present to linguistic theory lie in different areas. Thus, for Kartvelian languages, the number of factors at play in the verbal system make the task of the production of a correct verbal form far from trivial. West Caucasian languages represent an instance of polysynthetic polypersonal verb inflection, which is unusual not only for Caucasus but for Eurasia in general. East Caucasian languages have large systems of non-finite forms which, unusually, retain the ability to realize agreement in gender and number while their non-finite nature is determined by the inability to head an independent clause and to express certain morpho-syntactic categories such as illocutionary force and evidentiality. Finally, all Caucasian languages are ergative to some extent.
Empirical and theoretical research on language has recently experienced a period of extensive growth. Unfortunately, however, in the case of the Japanese language, far fewer studies—particularly those written in English—have been presented on adult second language (L2) learners and bilingual children. As the field develops, it is increasingly important to integrate theoretical concepts and empirical research findings in second language acquisition (SLA) of Japanese, so that the concepts and research can be eventually applied to educational practice. This article attempts to: (a) address at least some of the gaps currently existing in the literature, (b) deal with important topics to the extent possible, and (c) discuss various problems with regard to adult learners of Japanese as an L2 and English–Japanese bilingual children. Specifically, the article first examines the characteristics of the Japanese language. Tracing the history of SLA studies, this article then deliberately touches on a wide spectrum of domains of linguistic knowledge (e.g., phonology and phonetics, morphology, lexicon, semantics, syntax, discourse), context of language use (e.g., interactive conversation, narrative), research orientations (e.g., formal linguistics, psycholinguistics, social psychology, sociolinguistics), and age groups (e.g., children, adults). Finally, by connecting past SLA research findings in English and recent/present concerns in Japanese as SLA with a focus on the past 10 years including corpus linguistics, this article provides the reader with an overview of the field of Japanese linguistics and its critical issues.
Haihua Pan and Yuli Feng
Cross-linguistic data can add new insights to the development of semantic theories or even induce the shift of the research paradigm. The major topics in semantic studies such as bare noun denotation, quantification, degree semantics, polarity items, donkey anaphora and binding principles, long-distance reflexives, negation, tense and aspects, eventuality are all discussed by semanticists working on the Chinese language. The issues which are of particular interest include and are not limited to: (i) the denotation of Chinese bare nouns; (ii) categorization and quantificational mapping strategies of Chinese quantifier expressions (i.e., whether the behaviors of Chinese quantifier expressions fit into the dichotomy of A-Quantification and D-quantification); (iii) multiple uses of quantifier expressions (e.g., dou) and their implication on the inter-relation of semantic concepts like distributivity, scalarity, exclusiveness, exhaustivity, maximality, etc.; (iv) the interaction among universal adverbials and that between universal adverbials and various types of noun phrases, which may pose a challenge to the Principle of Compositionality; (v) the semantics of degree expressions in Chinese; (vi) the non-interrogative uses of wh-phrases in Chinese and their influence on the theories of polarity items, free choice items, and epistemic indefinites; (vii) how the concepts of E-type pronouns and D-type pronouns are manifested in the Chinese language and whether such pronoun interpretations correspond to specific sentence types; (viii) what devices Chinese adopts to locate time (i.e., does tense interpretation correspond to certain syntactic projections or it is solely determined by semantic information and pragmatic reasoning); (ix) how the interpretation of Chinese aspect markers can be captured by event structures, possible world semantics, and quantification; (x) how the long-distance binding of Chinese ziji ‘self’ and the blocking effect by first and second person pronouns can be accounted for by the existing theories of beliefs, attitude reports, and logophoricity; (xi) the distribution of various negation markers and their correspondence to the semantic properties of predicates with which they are combined; and (xii) whether Chinese topic-comment structures are constrained by both semantic and pragmatic factors or syntactic factors only.