Mixed Languages

Summary and Keywords

Mixed languages are a rare category of contact language which has gone from being an oddity of contact linguistics to the subject of media excitement, at least for one mixed language—Light Warlpiri. They show considerable diversity in structure, social function, and historical origins; nonetheless, they all emerged in situations of bilingualism where a common language is already present. In this respect, they do not serve a communicative function, but rather are markers of an in-group identity. Mixed languages provide a unique opportunity to study the often observable birth, life, and death of languages both in terms of the sociohistorical context of language genesis and the structural evolution of language.

Keywords: language contact, contact languages, mixed languages, pidgin and creole languages, relexification, metatypy, convergence, code-switching

Mixed languages are a type of contact language that arises as the result of the fusion of two languages, normally in situations of bilingualism. This dual linguistic parentage means that mixed languages cannot be classified according to standard historical comparative methods. They emerge in situations of severe social upheaval, serving as an expression of an altered identity, be it a new identity or the maintenance of an older identity. Beyond this broad definition, mixed languages are difficult to characterize on sociohistorical or typological grounds.

Mixed languages arise in diverse sociohistorical contexts; for example, speakers may be the children of mixed marriages or the descendants of groups who underwent colonial incursions or of migrants. Similarly, there is no coherent typology which sets apart mixed languages from other contact varieties. They fall into three categories roughly: (a) lexicon-grammar (L-G) mixed languages, where one language dominates the grammar and another language contributes significantly to the lexicon; (b) structural mixes, where both languages contribute significant amounts of grammatical material to the new language; and (c) converted languages, where a language maintains its lexicon but undergoes structural convergence with another language.

Mixed languages can be contrasted with other contact phenomena, such as pidgin and creole languages and code-switching. Mixed languages are created in situations where a common language already exists and communication is not at issue, whereas pidgin and creole languages are born out of the need for communication between people of a number of language groups. As a result, pidgin and creole languages are formed from (usually) one lexifier parent and a number of source languages which contribute to varying extents to the grammar, whereas mixed languages have two clear sources. Pidgin and creole languages are also (arguably) the result of successive generations of second language learners targeting the lexifier language rather than a situation of bilingualism, which is the case for mixed languages. Code-switching, on the other hand, is also found in bilingual contexts, but mixed languages show more stability (i.e., predictability in the sites of switches) and have developed new structures which are not reflected in either source language.

The identification of mixed languages as a legitimate subcategory of contact language was the result of work by Thomason and Kaufman (1988). Bakker and Mous’ (1994) and Thomason’s (1997b) edited volumes then drew together substantial amounts of data from languages which have been identified as being “mixed.” Michaelis et al. (2013) have since provided accessible sketches of some languages, and a number of monographs and edited volumes now provide detailed accounts of Michif (Bakker, 1997), Ma’á (Mous, 2003), Gurindji Kriol (Meakins, 2011), and Sri Lanka Malay (Nordhoff, 2009, 2012). One edited volume, a bibliography, and three substantial reviews of research on mixed languages also explore the theoretical implications of these languages (Bakker, 2013, 2015; Matras & Bakker, 2003; Meakins, 2013; Winford, 2003, Ch. 6).

We begin with an overview of languages that have been classified as mixed and present representative case studies of a number of languages within a typological classification of mixed languages. We then discuss the contemporary functions, sociohistorical origins, and linguistic processes which led to their genesis. As will be shown, mixed languages originate from a range of sociohistorical settings and linguistic processes that do not obviously predict the resultant shape of the language. Yet despite their bilingual origins, these languages operate as autonomous language systems.

1. Typology of Mixed Languages

Mixed languages are typologically diverse but can be categorized as L-G languages, structural mixes, or converted languages. This categorization is a refinement of other typologies, most notably Bakker (2003, 2015). Note that this is a synchronic classification only, based on comparisons with the source languages. Sociohistorical and linguistic processes are discussed in subsequent sections. For a more extensive list of languages and a comprehensive reference list, see Meakins (2013: 161–164).

1.1 Lexicon-Grammar Languages

Most mixed languages exhibit a split between the lexicon and grammar. Bakker (2003: 125) calls these lexicon-grammar (L-G) languages and lists 25 in a typological survey. L-G languages differ in which language provides the grammatical structure. Examples of L-G mixed languages which have the introduced language as the grammatical base are Angloromani and Ma’á, and examples where the ancestral language provides the grammar are Bilingual Navajo, Media Lengua, Kallawaya, and Old Helsinki Slang.

Just how much introduced lexical material is required to “qualify” as a mixed language is unclear. At the extreme end, 90 percent of Media Lengua’s vocabulary derives from the introduced language, Spanish. The percentages are much lower for Angloromani and Ma’á, but the operation of two parallel lexicons distinguishes these languages from normal borrowing scenarios. The use of parallel lexicons also differs from code-switching, because the speakers are not bilinguals but only have control over a second limited set of words or stems.

1.1.1 Angloromani

Angloromani is spoken by Romanies in Britain and is considered endangered (Matras, 2010). Currently it is not the language of conversation but rather is restricted to individual utterances. These utterances can be characterized as the use of a restricted set of Romani-derived lexicon, which Matras et al. (2007) calls a lexical reservoir, within an English grammatical frame. This lexical reservoir exists largely in parallel with English lexicon and is drawn on in situations where speakers want to mark a sense of solidarity or group cohesion. An example of the coupling of Romani-derived lexicon with English grammar is given in (1) and (2). These sentences contain Romani words inserted into an English frame, for example nouns fowki “people” and poshaera “penny” and pronouns lesti “he” and mandi “me” (note that these pronouns are etymologically locative forms, although the case distinctions have been lost). The Romani-derived words are given in italics.


  • The poor fowki that haven’t got a poshaera to their name!

  • The poor people who don’t have a penny to their name. (Matras et al., 2007: 115)


  • Lesti’s laughing at mandi.

  • He’s laughing at me. (Matras, 2010: 114)

Verbs and function words such as maw “NEG” are also common, although Romani verb inflections are no longer used. Some Romani morphology remains, such as a genitive -engra suffix which attaches to lexical roots to create a related word such as masengra (from mas “meat”). Matras et al. (2007) also observe that Angloromani speakers do not always use the definite article, aspect and existential auxiliaries, and coreferential pronouns in places where they would be expected in English. They argue that these features are not specifically Romani but indicate that Angloromani has slightly different grammatical rules than English.

It is likely that Angloromani came about after the Romanies had already shifted to English, as an attempt to reclaim their heritage language through the use of Romani, although see Thomason and Kaufman (1988: 103–104) for an alternative explanation of its origins. Romani has also fused with other European languages to form mixed languages (Carling, Lindell, & Ambrazaitis, 2014).

1.1.2 Ma’á

Another mixed language, which makes use of a reservoir of lexical material from the ancestral language, is Ma’á. This mixed language is spoken by Mbugu communities in the Usambara Mountains in Tanzania. Like Angloromani, Ma’á is spoken alongside one of its source languages, Mbugu (Bantu), and makes use of additional but limited Cushitic and Maasai parallel lexicon. The Mbugu were originally a Cushitic-speaking group from Laikipia in Kenya. In order to escape persecution from the Maasai, they moved to the Usambara Mountains via the Pare Mountains. The mixed language Ma’á is considered to be the result of resisting assimilation with the neighboring Pare. In this respect, it represents the stubborn persistence of an ethnic group (Mous, 2003).

The grammar of Ma’á is largely Bantu, similar to Pare. The word order is SVO (Bantu), not SOV (Cushitic), and most verb morphology is prefixing (Bantu) rather than suffixing (Cushitic). In form, the morphology is both Cushitic and Bantu. Ma’á also has 16 noun classes marked by noun prefixes and agreement markers on verbs, again a structure common to Bantu languages. Some Cushitic influence on the grammar can be observed. For example, adjectives and possessives often begin with k, ku- (old masculine marker); pronominal possessors are suffixing; only three demonstratives are present, and they do not agree with noun gender; and two causative suffixes on the verb are of Cushitic origin. The lexicon of Ma’á is largely Bantu but also contains a core Southern Cushitic vocabulary. In the following example, the non-Bantu elements are italicized.














There was an elder called Kimweri. (Mous, 2003: 9)








This is our well. (Mous, 2003: 179)

Mous (2003) claims that Ma’á is a conscious and deliberate result of an attempt to undo a shift to Pare, whereby speakers tried to relearn their ancestral language. He suggests this happened through a paralexification process in which a Bantu lexicon and a combined Cushitic and Maasai lexicon exist in parallel. Mbugu draws from the Bantu lexicon, whereas speakers use Cushitic and Maasai words in Ma’á. This notion of paralexification is similar to the lexical reservoir described for Angloromani.

1.1.3 Media Lengua

Another mixed language which retains the grammar of the ancestral language is Media Lengua (Gómez Rendón, 2008; Muysken, 1997b; Steward, 2014, 2015). This mixed language is spoken in Central Ecuador by Quechuan workers, or obreros. The morphosyntactic frame of Media Lengua is essentially Quechua (ancestral language) and therefore agglutinating, with around 90 percent of its stems replaced by Spanish forms (introduced language), although the figure is 65 percent based on a Swadesh list count. Word order is predominantly SOV (Quechuan). Nouns are inflected for case (nominative, accusative, locative, and other semantic cases), number, and determiners, and there is no gender marking, all Quechuan features. Similarly, verbs are inflected for tense, aspect, person, and number agreement for subjects, and there is no object agreement. Free pronoun forms are derived from Spanish, but they conform to Quechua patterns, with no case or informal/formal distinction. Spanish also contributes some structural features not found in Quechua, for example the structure of embedded WH-questions. Other features of the grammar seem to have developed independently. The following examples demonstrate the pattern of Spanish stems with Quechuan suffixes (italicized).










I come to ask a favor. (Muysken, 1997b: 365)








I fall into the water. (Muysken, 1997b: 366) 1

Media Lengua is the result of a relexification process whereby Quechuan stems were replaced with Spanish forms on the basis of semantic equivalence. This mixed language began developing from around 1967. Around this time, many young men started working in the construction industry in a nearby provincial town and learning Spanish. This was the group who created Media Lengua. Muysken claims that the genesis of this mixed language occurred “because acculturated Indians could not completely identify with the traditional Quechua culture or the urban Spanish culture” (Muysken, 1997b: 376). Another mixed language that has developed on the basis of a Quechuan morphosyntactic frame is Kallawaya (Callahuaya), which relexifies Quechan stems with Puquina forms (Hannß & Muysken, 2014; Muysken, 1997a). This mixed language is spoken by Kallawaya travelling healers in Bolivia.

1.1.4 Old Helsinki Slang

Mixes consisting of the grammar from one language and the lexicon from another may also be the result of a compromise between two different groups wishing to mark a new identity. Old Helsinki Slang is one such example. This mixed language was spoken in Helsinki between 1890 and 1950 by saki gangs which consisted of both Finnish- and Swedish-speaking boys and young men (Jarva, 2008). Old Helsinki Slang has a similar structure to Media Lengua: Swedish stems are inserted into a Finnish morphosyntactic frame. Up to 80 percent of stems in an Old Helsinki Slang clause can be of Swedish origin, although considerable variation probably existed between the use of Swedish and Finnish vocabulary, making it more like the lexical reservoir or paralexification process described for Angloromani and Ma’á. In a 200-word Swadesh list, 80 percent of verbs, adjectives, and nouns are of non-Finnish origin, while function words and closed-class lexical items are derived from Finnish, for example conjunctions, adpositions, pronouns, and numerals. An example of Old Helsinki Slang is given below. Finnish elements are represented by italics.




















Take Väiski to the sauna and wash his feet. (Jarva, 2008: 53)

Old Helsinki Slang originated in Helsinki at the end of the 19th century as a result of the migration of Finnish to Helsinki and increased bilingualism among Swedish speakers. Helsinki was established in a Swedish-speaking area of Finland. At the time, Swedish had a high status and was spoken by the upper class in Finland. This situation changed after the 1870s, with increasing numbers of Finnish immigrants to Helsinki and the increasing use of Finnish by the upper classes. Increasingly, Swedish speakers became bilingual in Finnish, though Finnish speakers generally remained monolingual (Jarva, 2008: 54–55). Old Helsinki Slang was born in the working-class areas in the northern quarters of Helsinki. Two-thirds of the population were Finnish speakers, but Swedish was still considered more prestigious. It was common for boys and young men to spend most of their time on the streets due to the lack of compulsory education and overcrowding in apartments. Many of these boys formed gangs called saki which consisted of both Finnish and Swedish speakers. The language mixture which emerged from these gangs was probably the result of a communicative compromise between these different groups of speakers and a way of marking a new in-group identity.

1.2 Structural Mixes

Although most mixed languages exhibit a split between the lexicon and grammar, others are more structurally mixed, as the following sketches of Mednyj Aleut, Michif, and Gurindji Kriol demonstrate. In these languages, both of the source languages contribute to the structure of the resultant mix, creating a composite morphosyntactic frame. By implication, both languages also contribute to the lexicon of the resultant mix. On the surface, these languages bear a striking resemblance to code-switching, and indeed they most likely originated in code-switching. Nonetheless, they can be distinguished from code-switching by the fact that they operate as an autonomous language system. These arguments are elaborated in section 6, “Mixed Languages as Autonomous Linguistic Systems.”

1.2.1 Mednyj Aleut

Mednyj Aleut is a highly endangered language spoken on Mednyj Island in the Bering Strait. The decline seems to be the result of the introduction of Russian education in the 1940s (Thomason, 1997a). Mednyj Island was first settled by Russian fur seal hunters in the early 19th century, and Aleuts were brought to the island soon after. Marriages between Russian men and Aleut women resulted, and the subsequent population were called “creoles.” Thomason (1997a) suggests that it was the creoles who created Mednyj Aleut. Golovko (2003) claims that they considered themselves Aleut, and regarded their language as a variety of Aleut.

In Mednyj Aleut, 61.5 percent of nouns and 94 percent of verb stems are Aleut. The nominal structure consists of mostly Aleut inflections, including two case distinctions, absolutive and relative, and various derivational suffixes such as agent, instrumental, and location. The most interesting locus of mixing occurs in the verb. Aleut provides most of the nonfinite verbal inflection, including agreement and the gerund form, and verb derivation, including inceptive, resultative, factive, transitivizer, and detransitivizer, whereas Russian provides the finite verbal inflectional morphology, including portmanteau morphemes that express tense, number, and person markers and a negative verb prefix derived from the Russian negative particle ne (Thomason, 1997a). Thus most verbs consist of an Aleut stem and Russian inflections, as shown in (8). All Aleut elements are in italics.










I brought you a parcel. (Thomason, 1997a: 457)

This structural outcome does not provide clear clues about the genesis of Mednyj Aleut, although it is perhaps the result of mixed marriages. Nonetheless, whether Aleut elements were transferred to Russian or vice versa remains a point of contention.

1.2.2 Michif

Michif is also the result of mixed marriages, in this case between Plains Cree–speaking women and French Canadian fur traders. Its genesis probably occurred in the early 1800s, developing among bilingual children of nomadic families in the Red River Colony area (now Manitoba and North Dakota) (Bakker, 1997).

Like Mednyj Aleut, Michif shows a great degree of structural mixing. In this case, the locus of mixing occurs between the verb and noun systems. The verb system is from Cree (Algonquian, polysynthetic), including four verb classes (in/transitive in/animate) and inflections for clause type, tense/mood, voice/valency/direction/aspect, and person and number agreement. The noun system is derived from French, including constituent order (Det-Adj-N or Det-N-Adj) order, article, and adjective gender agreement.

The language division of the noun and verb structures extends to the lexicon. Michif is composed of 83–94 percent French nouns and 88–99 percent Cree verbs, depending on the speaker. Interrogatives, postpositions, demonstratives, and personal pronouns are mostly Cree, while prepositions, adjectives, possessive pronouns, and numerals are almost exclusively French. You can watch Michif in A Conversation in Michif (2008). The French (NP)–Cree (VP) split is clearly demonstrated in (9) and (10). Cree elements are italicized.

















And when the wolf came to him, he opened his mouth. (Bakker, 1997: 5)












The priest blessed the people. (Bakker, 1997: 116)

As with all mixed languages, the origins of Michif is a matter of speculation. The conventionalization of French-Cree code-switching has also been offered as an explanation for its formation, although Bakker (2003: 128 onward) gives arguments against this claim.

1.2.3 Gurindji Kriol

Gurindji Kriol is another example of a mixed language which shows a V-N structural split according to languages. It is spoken by Gurindji people in northern Australia and derives from Gurindji (Pama-Nyungan) and Kriol (English-lexified creole). Gurindji Kriol originates from contact between non-Indigenous colonists and Gurindji people on a cattle station where Gurindji people worked in slave-like conditions. Code-switching provided fertile ground for the formation of this mixed language, which is now the first language of all Gurindji people under the age of 40 (McConvell & Meakins, 2005; Meakins, 2011).

Structurally, Kriol contributes much of the verbal grammar, including tense and mood auxiliaries and transitive, aspect, and derivational morphemes. Gurindji supplies most of the nominal structure, including case and derivational morphology. In this respect, the structure of Gurindji Kriol is quite similar to the V-N split seen in Michif; however, unlike Michif, in Gurindji Kriol nouns and verbs also come from both source languages. Watch an example of Rosie Smiler and her classificatory daughter Jamiesha speaking Gurindji Kriol. Some extracts are given below. Gurindji elements are in italics.

Rosie Smiler and her classificatory grandson Leyton Dodd speak Gurindji Kriol (with Sarah Oscar and Keeshawn Chubb).
















A fish jabbed Samantha in the hand.






















Then the monster went and stole the dog.

Both languages also contribute small amounts of grammar to the systems they do not dominate. For example, the Gurindji continuative suffix is found in the VP, and Kriol determiners are common in the NP. Gurindji Kriol also has Kriol SVO word order, although Gurindji information structure also determines word order to some extent. Complex clauses are constructed using both Gurindji and Kriol strategies; for example, coordinating and relative clauses use Kriol conjunctions and relative pronouns, and subordinate clauses are formed using Gurindji-derived case and factive marking. New structures have also developed, such as an asymmetrical serial verb construction, and old structures have undergone change, such as the ergative marker, now an optional nominative marker with discourse functions.

In terms of the lexicon, Gurindji Kriol derives its lexicon relatively evenly from both languages. Based on a 200-word Swadesh list, 36.6 percent of vocabulary is derived from Kriol (nouns: colors, kin (parents and their siblings), some animals and plants; verbs: most basic), and 35 percent of vocabulary is derived from Gurindji (nouns: artifacts, body parts, kin (siblings, grandparents, in-laws), most animals and plants; verbs: impact, motion, body functions); the remaining 28.4 percent contain synonymous forms from both languages. A similar mixed language is Light Warlpiri, which is also spoken in northern Australia (O’Shannessy, 2013).

1.3 Converted Languages

Converted languages develop when the ancestral language maintains its lexicon but undergoes a complete restructuring of its morphosyntax on the basis of an introduced language. They differ from the previous categories of mixed languages in that all of the surface forms, including lexicon and morphology, derive from one language. Thus, on the basis of the comparative method, they are generally classified according to the language of their lexicon. This classification, however, belies their bilingual heritage. Note, however, that Takia is classified as Austronesian according to phylogenetic methods using both lexical and structural features (Dunn, Terrill, Reesink, Foley, & Levinson, 2005), which is at odds with the so-called inability of mixed languages to be classified according to historical methods (Thomason & Kaufman, 1988).

Converted languages are the result of a process called metatypy (Ross, 2006), which will be discussed in section 4.3, “Metatypy.” This language contact process affects many languages to varying extents, such as Kannada (Dravidian) on the model of Marathi (Indo-Ayran) and Arvanitic on the model of Greek, and may result in a Linguistic Area or Sprachbund (Muysken, 2008). The two languages offered here as mixed languages differ from these languages due to the extensive nature of restructuring.

1.3.1 Sri Lanka Malay

Sri Lanka Malay is spoken in a number of communities in Sri Lanka by the Malay minority. Sri Lanka Malay is a Malay/Indonesian (Austronesian) variety heavily restructured under the influence of Tamil (Dravidian) and more recently Sinhala (Indo-Aryan), which occurred as a result of sustained social contact with Tamil-speaking Moors and pervasive Malay-Tamil bilingualism among Malay descendants. The result is a language which is unintelligible to Malay speakers despite its Austronesian lexicon (Ansaldo, 2008; Nordhoff, 2009).

Structurally, Sri Lanka Malay developed from an isolating language to an agglutinating language under the influence of Tamil. It has also acquired SOV word order, postpositions, and prenominal determiners and adjectives due to this contact (see papers and references therein from Nordhoff, 2012).










The teacher sent the children to school. (Ansaldo, 2008: 27)










Here do they do daily wage work? (I. Smith & Paauw, 2006: 164)

The Sri Lanka Malays are descendants of immigrants who were brought to Sri Lanka at different times by Dutch (1656 onward) and British colonists (1796 onward). Although they are called Malays, they came from a number of places, including Banda, Balu, and Java, with only a Malay trade language in common. Traditionally the Malay community has had close ties with the Tamil-speaking Moor community, who are also Muslims. Sri Lanka Malay is now an endangered language, restricted to home use and generally not being spoken by younger generations, with Sinhala/English bilingualism becoming dominant.

1.3.2 Takia

Takia is spoken on Karkar Island off the north coast of Papua New Guinea (PNG). The lexicon is Austronesian, but the language has undergone extensive restructuring on the model of Waskia (Trans New Guinea), which is also spoken on Karkar Island (Ross, 2001, 2006).

Many basic words and morphemes in Takia are cognate with its closest Austronesian neighbors, for example Ronji (spoken 100 km southeast from Takia). Despite the lexical similarities between Takia and Ronji, their syntax is very different. Where word order in Ronji is SVO and TAM is marked by preverbal auxiliaries, Takia is SOV with TAM enclitics, as shown in (15) and (16).









I shall eat banana. (Ross, 2006: 95)






Ronji (Austronesian)





I shall eat banana. (Ross, 2006: 95)

Instead the syntax of Takia seems to match languages in another language family, the Trans New Guinea (TNG) languages. A TNG language spoken on Karkar Island is Waskia. Takia shares no cognate forms with Waskia, but their morphosyntactic structures are strikingly similar. Word order in both languages is SOV, TMA is marked using enclitics, and NP order is noun-determiner, as shown in (17) and (18). In additional, many bimorphemic words or idioms in Takia match the semantics of Waskia, although the phonological forms themselves are Austronesian (Ross, 2001: 144; 2006: 95).











The man is hitting me. (Ross, 2001: 140)






Waskia (TNG)





The man is hitting me. (Ross, 2001: 140)

How Takia, an Austronesian language, came to acquire the syntactic patterning of Waskia, a Trans New Guinea language, is not entirely clear. Various hypotheses exist, such as mixed marriages between pre-Takia and Waskia groups, the migration of pre-Takia people to Karkar Island due to volcanic activity in their original home on mainland Papua New Guinea, or trading partnerships between pre-Takia and Waskia men.

2. Social Functions and Origins of Mixed Languages

The genesis of mixed languages is a product of expressive rather than communicative needs, which contrasts them with pidgin and creole languages (Golovko, 2003: 191; Muysken, 1997b: 375). Pidgin and creole languages are the result of a need for communication between people of a number of language groups, whereas mixed languages are created in situations where a common language already exists and communication is not at issue. In this respect, mixed languages serve as an expression of an altered identity, whether new or differing significantly from an older identity. The following sections consider commonalities in the social functions of mixed languages and the sociohistorical contexts of their genesis.

Functions of Mixed Languages

Many mixed languages are spoken by new ethnic groups. Some of these groups find their origins in mixed marriages. For example, Michif speakers are the children of Cree mothers and French fathers. They call themselves Métis, which reflects the mixed identity of their group. One mixed language which marks a new identity without the background of mixed marriage is Old Helsinki Slang. In this case, the new identity was formed within mixed Swedish- and Finnish-speaking gangs.

Mixed languages are also spoken by people who do not constitute a separate ethnic identity but regard their language as emblematic of a continuing ancestral identity. Speakers of Media Lengua do not separate themselves from Quechua people, although they are a subgroup who identify to a certain extent with urban Hispanic society. Speakers of Gurindji Kriol speakers also do not separate themselves from Gurindji people. This mixed language represents an attempt to maintain Gurindji under the continuing colonial pressure of English, and indeed the mixed language is usually referred to as “Gurindji” (Meakins, 2012: 109).

Related to the association of mixed languages with a new or continuing identity is their use as the primary language of a speech community. Indeed, in the case of Michif, speakers generally no longer know either of the contributing languages, French or Cree. The majority of mixed languages, however, are spoken alongside one or more of their source languages. Smith (2000) calls these languages symbiotic mixed languages. For example Mednyj Aleut was spoken concurrently with a number of Aleut dialects and Russian, although it is not clear whether Mednyj Aleut speakers had control of one or more of its input languages. Similarly, Media Lengua is learned either as a first or second language. Middle-aged speakers of this mixed language also may have access to both input languages. Younger speakers tend to speak Spanish better, and older speakers Quechua.

3. Sociohistorical Origins of Mixed Languages

Mixed languages derive from one of three sociohistorical settings: mixed marriages, migration, or cultural incursion. None of these contexts is particular to the formation of mixed languages. For example, creolization and language shift occur in colonial settings, and bilingualism and code-switching are found in all of these contexts.

The first category involves mixed marriages between men from one society and women from another. The children of these mixed marriages are said to form their own distinct cultural identity, with the mixed language an enactment of this identity. Michif is the classic example of this type of mixed-language genesis. This Canadian mixed language and its speakers are the product of marriage between French-Canadian fur traders and Amerindian women. Mednyj Aleut also probably emerged from mixed marriages, in this case between Aleut women and Russian seal fur traders in the early 1800s. Old Helsinki Slang also fits into this category, although it is not the result of marriage as such. Nonetheless it developed from a mixed group of Finnish and Swedish youths who formed a new identity through speaking it.

The second and third categories consist of mixed languages where a mixing of ethnic groups has not occurred but rather one group has dominated over another, providing the context for the development of the mixed language. In some cases, a mixed language emerges as the result of a minority group migrating to a new region where the dominant language is different from their own. People may have migrated to a new region to escape persecution (Ma’á), for economic reasons (Media Lengua), for environmental reasons (Takia), or because they were brought by another group (Sri Lanka Malay). In other cases, mixed languages arise when groups are colonized and become minorities in their own country. Gurindji Kriol developed as a result of the invasion of Australia by British colonists.

In all of these situations, mixed languages emerge as the result of a change in the dominance of languages when speakers shift towards the introduced language. In some cases, this process does not go to completion, and what remains is the mixed language. Thus the mixed language can having varying amounts of material from the introduced language from borrowed nouns right up to morphology (see section 4.1, “Borrowing and Code-Switching Approaches”). Media Lengua is an example of a mixed language which resulted from a partial shift. The relexification of Quechua with Spanish was a consequence of Quechua men becoming more fluent in Spanish. Gurindji Kriol progressed further along the scale, incorporating large amounts of structural material from Kriol as the speakers shifted away from their ancestral language, Gurindji. In other cases, the shift goes to completion and the mixed language forms as the result of an attempt to reverse this shift by reintroducing material from the ancestral language. In this scenario, the ancestral language is still available at the time of genesis, perhaps still spoken by older generations. Generally, only lexical material is found from the ancestral language in the mix. Mixed languages, such as Angloromani or Ma’á, which utilize lexical reservoirs from the ancestral language could be regarded as examples of such a linguistic U-turn.

4. Structural Processes Involved in Mixed Language Genesis

As sections 1, “Typology of Mixed Languages,” and 2, “Social Functions and Origins of Mixed Languages,” show, mixed languages are typologically diverse, have a range of social functions, and are the product of different historical circumstances. Furthermore, there is no clear link between the structures of mixed languages and their functions or sociohistorical origins. For example, although Michif and Gurindji Kriol are both structural mixes, Michif is the result of mixed marriages and Gurindji Kriol is the result of language shift by a single cultural group. Similarly, although Angloromani and Old Helsinki Slang exhibit a lexicon-grammar split, Angloromani represents the persistence of the Romani identity, whereas Old Helsinki Slang expressed a new identity for Finnish and Swedish street gangs.

This diversity is reflected in the linguistic practices of the speech communities which shaped the mixed languages as they emerged. Mixed languages find their genesis in one of three processes: borrowing or code-switching, relexification, and metatypy. These processes can be distinguished by whether a replication of form or structure or both is involved. Form refers to the phonological shape of the lexical/morphological material, and structure to the syntactic or semantic patterns of a language. Mixed languages which are the result of code-switching or borrowing replicate both forms and structures from their source languages in the resultant mix. On the other hand, mixed languages which stem from relexification replicate only form, and those which stem from metatypy only structure:

Table 1. Processes which lead to the formation of mixed languages












4.1 Borrowing and Code-Switching Approaches

The genesis of many mixed languages is probably the result of large-scale borrowing or switching. In the case of one mixed language, Gurindji Kriol, empirical evidence exists for the code-switching scenario (McConvell & Meakins, 2005). Borrowing or code-switching involves the replication of lexical and morphological material from the source language into the target language. Depending on the direction of shift, the target language may be the ancestral or introduced language. Mixed languages are then a halfway house of language shift, no longer classifiable as a variety of one language or the other. We begin with a prominent borrowing theory of mixed language genesis (Thomason & Kaufman, 1988), relate it to a major code-switching approach (Myers-Scotton, 2003), and discuss the mixed languages which find their origin in these types of processes.

Thomason and Kaufman (1988) base a theory of mixed language genesis on their borrowing scale, which relates social factors to the borrowability of particular linguistic categories, from nouns through to substantial structural borrowing, including that of inflectional morphology. The scale is implication: it assumes that where derivational morphology has been borrowed, conjunctions are already been borrowed. Their borrowing scale mirrors those of previously proposed hierarchies.

Table 2. Thomason and Kaufman’s borrowing scale (1988: 74–75)

Degree of Contact

Borrowing Type

Features Borrowed

1. Casual contact


nonbasic vocabulary before basic

2. Slightly more intense contact


functional vocabulary (e.g., conjunctions and adverbs)


only new functions borrowed

3. More intense contact


pre/postpositions, derivational affixes, inflectional affixes (attached to stem), pronouns, low numerals


change in word order, borrowing postpositions in a prepositional language

4. Strong cultural pressure


extensive word order change, inflectional affixes (e.g., case)

5. Very strong cultural pressure


typological disruption, changes in word structure (e.g., adding prefixes in suffixing language), change from flexional to agglutinative morphology

Related to theories of borrowing is the idea that mixed languages are derived from code-switching, specifically insertional code-switching whereby elements from one language insert into another language’s morphosyntactic frame or matrix. Myers-Scotton (2003) theorizes the move from insertional code-switching to a mixed language within her Matrix Language Frame model, labelling the transition the Matrix Language Turnover Hypothesis. This hypothesis is concerned with the change in dominance of the participating languages, and Myers-Scotton proposes that mixed languages arise when there is a turnover under way which does not go to completion. According to Myers-Scotton, mixed languages may stop at different places, which explains why they surface in different forms and with the split in different places. Similar to Thomason and Kaufman, Myers-Scotton developed a scale of likely switches, with content words such as nouns easily switched and inflectional morphology impossible to switch (in her view, although Gurindji Kriol is an exception).

Under both borrowing and code-switching accounts, mixed language formation may halt at the least disruptive end of the scale and exhibit only lexical borrowings. The L-G languages are a good example because they are characterized by a clear division between the lexicon and the grammar where these systems are dominated by a different source language. An example is Old Helsinki Slang (Finnish grammar, Swedish lexicon). Note that this category includes mixed languages where the language contributing the grammar is the ancestral language, for example Old Helsinki Slang, and languages where the introduced language provides the grammar, for example Angloromani and Ma’á. In the case of Angloromani and Ma’á, a subset of vocabulary from the ancestral language (Romani and Cushitic) is maintained as a lexical reservoir and exists in parallel with the lexicon of the grammar language (English and Mbugu). A contemporary process of paralexification replaces vocabulary utterance by utterance. In this respect, paralexification is occurs synchronically and is not a diachronic process, and although this process resembles code-switching, paralexification does not require bilingualism.

At the other end of the scale are structural mixes which contain inflectional morphology from both languages. At this stage, other borrowings such as lexical and more minor structural borrowings are assumed. A number of mixed languages exhibit this grammar mixture, including Michif, Mednyj Aleut, and Gurindji Kriol. For example, inflectional morphology from both French and Cree is present in Michif. Verbal inflections are derived from Cree, and in the NP Michif preserves both French plural marking and adjectival agreement. Similarly, Gurindji Kriol combines Kriol, the language of the verbal inflectional categories (tense and mood markers), with Gurindji nominal inflections in the form of case marking, both syntactic (ergative, dative, possessive) and semantic (locative, allative, ablative). Mednyj Aleut has also retained inflectional morphology from both source languages. It includes both Aleut nominal inflections such as two case distinctions (absolutive and relative) and Russian finite verbal inflectional morphology, including portmanteau morphemes which express tense, number, and person.

Mixed LanguagesClick to view larger

Figure 1. A continuum of grammatical mixing in mixed languages. Meakins, F., 2013. Mixed languages, in: Matras, Y., Bakker, P. (Eds.), Contact Languages: A Comprehensive Guide. Mouton, Berlin, pp. 159–228.

The situation described for Michif, Mednyj Aleut, and Gurindji Kriol is exceptional given the fragility of inflectional morphology in other language contact situations. For example, inflectional morphology is rarely borrowed and is mostly derived from the more dominant language in code-switching. Indeed, Matras (2003: 158) suggests that a particular feature of mixed languages is the seemingly unconstrained borrowing of grammatical elements, which in the past have been labelled as “loan-proof.”

4.2 Relexification

The process of relexification is more familiar as an account of the origin of creole languages, but it has also been offered as an explanation for the formation of Media Lengua (Muysken, 1981) and Michif (Bakker, 1989). Relexification differs from borrowing or code-switching in that it is only the phonological form which is borrowed from another language rather than the whole structure of the lexical and morphological item. This form is then mapped onto the recipient language’s own structure. For example, while the form of a verb may be borrowed, it is mapped onto the predicate argument structure of the recipient language.

In the case of Media Lengua, Spanish forms have been borrowed into Quechua but in the process have adopted the structure and meaning of the equivalent Quechuan forms. For example, Muysken (1981: 57) observes that Spanish pronominal forms have been mapped onto the Quechuan pronominal paradigm. In the case of third person singular pronouns, Quechua does not distinguish masculine/feminine in the third person, whereas Spanish does. Media Lengua also does not make this distinction but uses the form el as a general third person pronoun, which is a phonological compromise between the Spanish él “he” and ella “she.”

4.3 Metatypy

All of the converted languages are the result of the diachronic process of metatypy. Metatypy involves the typological restructuring of one language on the model of another while maintaining the forms of the original language (Ross, 2006: 95). Metatypy is a process by which speakers who speak two languages essentially have one grammar (semantic and morphosyntactic organization) and two lexicons in operation. The language that undergoes restructuring is emblematic of the speech community’s identity (i.e., their ancestral language), and the language on which they restructure their traditional language is one used to communicate with another speech community (Ross, 2001: 146). Thus metatypy is a process by which speakers who speak two languages essentially have one grammar (semantic and morphosyntactic organization) and two lexicons in operation. This process lessens the cognitive effort required to compute or produce two languages.

Sri Lanka Malay and Takia are mixed languages which find their roots in metatypy. They have both maintained the forms, including vocabulary and morphology, of their ancestral languages, Malay/Indonesian (Austronesian) and pre-Takia (Austronesian) respectively, but restructured according to the language of the dominant speech community, Tamil (Dravidian) and Waskia (Trans New Guinea) respectively. Metatypy has affected not only the syntactic organization of a clause (e.g., word order) but also the morphological profile. For example, in the case of Sri Lanka Malay, Smith, Paauw, and Hussainmiya (2004: 2004) observe that Malay prepositions have become postpositional case-markers under the influence of Tamil, which is a suffixing and dependent-marking language. Similarly, Takia has lost its few prepositions and calqued a more substantial set of postpositions that almost exactly matches Waskia in semantic organization. Thus the postpositions are Austronesian in form but Trans New Guinea in organization.

Table 3. Grammatical calquing in Takia (Ross, 2001: 143)




na, te

se, te, i

location “in”



location “on”

fo, futo













accompaniment (comitative)



5. Summary of Typological, Functional, and Sociohistorical Profile of Mixed Languages

Below is a summary of mixed languages, indicating their typology, function, and sociohistorical and linguistic background. Few generalizations can be made. As discussed in section 4, “Structural Processes Involved in Mixed Language Genesis,” there is little link between sociohistorical background and the resultant structure of mixed languages. There is more correlation between the linguistic processes which went into the formation of the language, for example, structural mixes, and most L-G languages are the result of borrowing or code-switching, whereas converted languages stem from a process of metatypy.

Table 4. Mixed languages by typology, function, sociohistorical, and linguistic background.




Sociohistorical Background


L-G Language Ancestral grammar

Media Lengua

new group

migration, language shift


Old Helsinki Slang

new group

mixed group, language shift


L-G Language Introduced grammar


continuing ancestral identity

migration, language reclamation

code-switching/borrowing (paralexification)


continuing ancestral identity

migration, language reclamation

code-switching/borrowing (paralexification)

Structural Mix


new group

colonization, mixed marriage


Mednyj Aleut

continuing ancestral identity?

colonization, mixed marriage


Gurindji Kriol

continuing ancestral identity

colonization, language shift


Converted Language

Sri Lanka Malay

continuing ancestral identity




continuing ancestral identity



6. Mixed Languages as Autonomous Linguistic Systems

One of the criticisms often leveled at descriptions of mixed languages is the lack of autonomy of the language variety presented (Bakker, 2003; Meakins, 2012). The term “autonomous language system” refers to the ability of the language to function as a stand-alone synchronous linguistic entity with only minimal continuing input from its source languages. Thus changes in the source languages do not feed into the mixed language and vice versa. Whether such a level of autonomy is possible for a mixed language is indeed questionable, given that most are spoken alongside one or both of their source languages. Furthermore, there is often a close diachronic relationship between other mixing practices and mixed languages. For example, L-G languages strongly resemble insertional code-switching (Bakker, 2003: 129), and code-switching has been shown to precede the formation of some mixed languages, for example Gurindji Kriol (McConvell & Meakins, 2005). Nonetheless, criteria have been developed to support the claim of language autonomy in a mixed language: (a) the stability of the language, and (b) the independent development of the source or mixed language. A detailed discussion of these criteria can be found in Meakins (2013), and its application to Gurindji Kriol can be found in Meakins (2012).

6.1 Language Stability

This section offers some indicators which may be used to judge the stability of a mixed language. The clearest demonstration of language stability is when a mixed language is spoken outside of the bilingual context in which it arose, that is speakers are no longer fluent in the source languages. Michif is one such example. Although Michif is derived from French and Cree, most of its speakers are not fluent in either language. Indeed nowadays most Michif speakers are elderly, and the main language of Michif communities has become English (Bakker, 1997: 74). In other situations, mixed languages have a symbiotic relationship with their source languages. For example, Ma’á (Inner Mbugu) speakers are also fluent in (Outer) Mbugu and even code-switch between these languages (Mous, 2003: 86 onwards). The case for language autonomy becomes quite blurred because there is potential for continuing influence from the source languages, making the language less stable. Being able to demonstrate that Ma’á operates as a closed linguistic system that is only minimally influenced by its continued contact with the source languages is difficult to maintain in such a language ecology.

Another measure of stability in a mixed language is the degree of consistency both between and within speakers in their use of lexicon and grammar. For example, the choice of lexical items and syntactic constructions is very consistent across speakers in Gurindji Kriol. As a result, Gurindji Kriol speakers use virtually identical constructions to express the same event. This point can be demonstrated looking at a small subset of data consisting of 18 tokens of the sentence “the dog bit the man on the hand” from 18 different speakers (Meakins, 2012: 116). Of these 18 sentences, the Gurindji words warlaku “dog,” marluka “old man” and wartan “hand” is used in all 18 sentences, with the Kriol baitim “bite” used in 89 percent of sentences in variation with the Gurindji equivalent katurl. Syntactically, all pronouns present are Kriol-derived free forms, and similarly any verbal inflection found is of Kriol origin. The Gurindji-derived ergative marker -ngku is used in 61 percent of the sentences,2 and the locative marker -ta is found 83.5 percent of the time, with the Kriol preposition la used in the remaining sentences.


The level of uniformity in lexical and syntactic choices shown by Gurindji Kriol speakers supports its status as a language independent of its sources. For example, though speakers regularly hear Gurindji verbal morphology from older speakers, they consistently use Kriol tense and aspect markers, e.g. bin “past” in (19).

Related to the issue of stability is variation. The presence of variation in mixed languages is seldom discussed and sometimes even played down in order to avoid questions of autonomy (but see Bakker, 1997: 159; Mous, 2003: 7 for observations of variation in Michif and Ma’á, respectively). Nonetheless, it plays an important role in the formation of mixed languages and continues to affect their evolution. The presence of variation does not undermine the notion of an autonomous language system. Variation has long been recognized as a normal and integral part of all language systems. Within the context of mixed languages, variationist methodology offers a way of characterizing the use of functionally equivalent forms, such as the Gurindji locative case marker or Kriol preposition, as discussed in (19) for Gurindji Kriol.

The presence of variation is not indicative of instability or a lack of language autonomy, but can be treated as meaningful within a contained linguistic and social system. For example, the Gurindji-derived ergative marker is only used optionally, which contrasts with the categorical use of the ergative marker in Gurindji. One interpretation of this variation may be that the Gurindji Kriol system is unstable, however Meakins (2011: Ch. 9) shows that the variable application of the ergative marker acts in a coherent manner. The ergative marker is more likely to appear if the agent is inanimate, found postverbally and in conjunction with a coreferential pronoun, which is interpreted as an indication that the main function of the ergative marker in Gurindji Kriol is discourse-related; specifically, its presence highlights the agentivity of a subject nominal. Similar analyses exist for noncontact languages with optional ergative marking (McGregor, 2010). Studies of this kind demonstrate that variation in a mixed language system is not a sign of the fragility of the language but rather a part of a systematic grammar.

6.2 Independent Development

Language autonomy may also be demonstrated by the independent development of the mixed language and its sources. In this scenario, a change in the source languages does not necessarily imply a change in the mixed language and vice versa. Bakker (2003: 126) gives examples of cases where changes in the source languages are not reflected in the mixed language. The development of unique forms or functions in the mixed languages is another indication of an autonomous language system.

Mixed language developments are not always reflected in the source languages. For example, in Sri Lanka Malay, Malay-derived prepositions have become postpositional case-markers under the influence of Tamil (Smith et al., 2004: 2004). What is particularly interesting is that syncretism occurs between dative and accusative marking (distinguished by optionality—accusative marking is optional where dative marking is not). This syncretism is not observed in any of its source languages. Tamil and Sinhala distinguish these categories with separate markers, and Vehicular Malay marks the dative but not the accusative with a preposition. Further, although the forms of the case suffixes come from Vehicular Malay, the origin of the accusative/dative case suffix -n(y)a(ng) is obscure. Thus in both distribution and form, the accusative/dative marker is a unique form in Sri Lanka Malay, and there is no evidence that this marker has fed back into its source languages.

Forms in a mixed language may also develop different functions from the source language it is derived from. For example, verbs (of both Swedish and Finnish origin) are conjugated according to a mix of the first and fourth conjugational classes of a Finnish dialect spoken near Helsinki (Jarva, 2008: 73). Additionally many words of both Swedish and Finnish origin are augmented by slang suffixes such as -ari, -is and -tsi. These suffixes are unique to Old Helsinki Slang, differing from the epenthetic vowels required to borrow Swedish words into standard Finnish (Jarva, 2008: 70).

7. Mixed Languages: A Cohesive Class?

A variety of languages have been classified as mixed languages. Lexically, they range from languages that derive an extraordinary amount of vocabulary from one language and their grammar from another (Media Lengua, Old Helsinki Slang) to languages that selectively replace lexical items according to specific communicative contexts (Angloromani, Ma’á). Mixed languages also differ according to structure. In some mixed languages, the structure is clearly derived from one language (Media Lengua, Angloromani, Ma’á, Old Helsinki Slang). In others, the two source languages contribute relatively equal amounts of structure to the mix in a way that contrasts dramatically with other language contact scenarios (Michif, Mednyj Aleut, Gurindji Kriol). Yet other examples have restructured one language on the basis of another such that, on the surface, the language “looks” like one of its sources, but it has mapped these forms onto the other language’s grammar (Sri Lanka Malay, Takia).

Despite these differences, what is shared by these languages is that they have emerged in situations of bilingualism where a common language is already present. In this respect, they do not serve a communicative function but rather are markers of an in-group identity, whether a new identity created through mixed marriages or groups (Michif, Mednyj Aleut, Old Helsinki Slang) or the maintenance of an old identity which is under threat (Angloromani, Gurindji Kriol, Ma’á). Despite this common sociohistorical cradle, little more can be predicted. It appears from the variation in structural outcomes that different contact situations can result in similar mixed languages, and different mixed languages may arise from similar contact situations.

Further Reading

Bakker, P. (2013). Michif. In S. Michaelis, P. Maurer, M. Haspelmath, & M. Huber (Eds.), The survey of pidgin and creole languages, Vol. III (pp. 158–165). Oxford: Oxford University Press.Find this resource:

Hoff, B. (1994). Island Carib, an Arawakan language which incorporated a lexical register of Cariban origin, used to address men. In P. Bakker & M. Mous (Eds.), Mixed languages: 15 case studies in language intertwining (pp. 161–168). Amsterdam: IFOTT.Find this resource:

Huttar, G., & Velantie, F. (1997). Ndyuka-Trio Pidgin. In S. G. Thomason (Ed.), Contact languages: A wider perspective (pp. 99–124). Amsterdam: John Benjamins.Find this resource:

Matras, Y. (2012). A grammar of Domari. Berlin: Mouton de Gruyter.Find this resource:

Meakins, F. (2013). Gurindji Kriol. In S. Michaelis, P. Maurer, M. Haspelmath, & M. Huber (Eds.), The survey of pidgin and creole languages, Vol. III (pp. 131–139). Oxford: Oxford University Press.Find this resource:

Muysken, P. (2013). Media Lengua. In S. Michaelis, P. Maurer, M. Haspelmath, & M. Huber (Eds.), The survey of pidgin and creole languages, Vol. III (pp. 143–148). Oxford: Oxford University Press.Find this resource:

Mous, M. (2013). Mixed Ma’á/Mbugu. In S. Michaelis, P. Maurer, M. Haspelmath, & M. Huber (Eds.), The survey of pidgin and creole languages, Vol. III (pp. 42–49). Oxford: Oxford University Press.Find this resource:

O’Shannessy, C. (2009). Language variation and change in a north Australian Indigenous community. In D. Preston & J. Stanford (Eds.), Variationist approaches to Indigenous minority languages (pp. 419–439). Amsterdam: John Benjamins.Find this resource:

Slomanson, P. (2013). Sri Lanka Malay. In S. Michaelis, P. Maurer, M. Haspelmath, & M. Huber (Eds.), The survey of pidgin and creole languages: Vol. III (pp. 77–85). Oxford: Oxford University Press.Find this resource:


(1.) Glossing abbreviation: CIS=cislocative.

(2.) Which is consistent with the pattern of optional ergativity seen in Gurindji Kriol; see section 1.2.3, “Gurindji Kriol.”