Show Summary Details

Page of

PRINTED FROM the OXFORD RESEARCH ENCYCLOPEDIA, LINGUISTICS ( (c) Oxford University Press USA, 2018. All Rights Reserved. Personal use only; commercial use is strictly prohibited (for details see Privacy Policy and Legal Notice).

Subscriber: null; date: 16 November 2018

Genealogical Classification in Historical Linguistics

Summary and Keywords

Different methods exist for classifying languages, depending on whether the task is to work out the relations among languages already known to be related—internal language classification—or whether the task is to establish that certain languages are related—external language classification.

The comparative method in historical linguistics, developed during the latter part of the 19th century, represents one method for internal language classification; lexicostatistics, developed during the 1950s, represents another. Elements of lexicostatistics have been transformed and carried over into modern computational linguistic phylogenetics, and currently efforts are also being made to automate the comparative method. Recent years have seen rapid progress in the development of methods, tools, and resources for language classification. For instance, computational phylogenetic algorithms and software have made it possible to handle the classification of many languages using explicit models of language change, and data have been gathered for two thirds of the world’s language, allowing for rapid, exploratory classifications. There are also many open questions and venues for future research, for instance: What are the real-world counterparts to the nodes in a family tree structure? How can shortcomings in the traditional method of comparative historical linguistics be overcome? How can the understanding of the results that computational linguistic phylogenetics have to offer be improved?

External language classification, a notoriously difficult task, has also benefitted from the advent of computational power. While, in the past, the simultaneous comparison of many languages for the purpose of discovering deep genealogical links was carried out in a haphazard fashion, leaving too much room for the effect of chance similarities to kick in, this sort of activity can now be done in a systematic, objective way on an unprecedented scale. The ways of producing final, convincing evidence for a deep genealogical relation, however, have not changed much. There is some room for improvement in this area, but even more room for improvement in the way that proposals for long-distance relations are evaluated.

Keywords: comparative method, regular sound change, reconstruction, post hoc shared innovation, lexicostatistics, computational linguistic phylogenetics, mass comparison, ASJP, distance-based, character-based, Oswalt’s shift test

1. Introduction

This overview is divided into two main sections, one on internal and one on external language classification. Internal language classification is the application of methods for uncovering the phylogenetic relations among a set of languages already known to be related. Several of the relevant methods have been around for a long time and their properties are generally well understood. Others have entered the field of linguistics in more recent times from biology. External language classification concerns the comparison of languages with the aim of establishing that they are related either by common descent or by contact (or both). The recipes for doing external language classification are not well established, and proposals regarding long-distance relationships, be they genealogical or areal, therefore tend to be beset with controversy.

2. Internal Language Classification

2.1. The Comparative Method

The beginning of scientifically-based language classification can be identified with the first publication of a family tree of Indo-European languages by Schleicher (1853). Such trees had been around in biology since the beginning of the 19th century, although, just as in linguistics, they do not seem to be used systematically until the latter half of the 19th century.

There is more to a family tree than meets the eye: first, it is necessarily based on certain criteria, which can be quite different from case to case, and there are also different ways of interpreting trees. Schleicher is not explicit about the criteria for specific aspects of his tree, but it is clear that his tree, in general, is what would be referred to in the early 21st century as a phenetic tree, that is, a structure meant to depict the amount of similarity between languages (including intermediate proto-languages). An important improvement in the thinking behind trees came with the work of Leskien (1876), which was written in response to a question issued by a learned society about the relationship between Balto-Slavic and Germanic within Indo-European. Leskien sets out to look for shared innovations, is not able to find them, and thus concludes that Balto-Slavic and Germanic do not form a subgroup within Indo-European. Subsequently to Leskien’s work, the criterion of shared innovations has become standard in the construction of family trees through the application of the comparative method. Brugmann (1884) is another important contribution to the methodology of constructing trees. Schmidt (1872) had argued that similarities among Indo-European languages are distributed in a chain-like rather than a tree-like manner, which would argue for something like a wave-model of relationships, but Brugmann instead argues that it is possible to distinguish systematically between shared innovations that belong to single subgroups and cases of shared retentions, independent innovations, and diffusion. Once these different sources of similarities have been teased apart, the tree model can be rescued.

An opposition between scholars who favor trees and those who consider trees too much of an idealization, or even a misrepresentation, of the way that languages relate to one another continues to exist, however. Among the representatives of the former position we can point to some practitioners of computational phylogenetic approaches who take trees to adequately represent population histories (Gray & Jordan, 2000; Gray, Drummond, & Greenhill, 2009; Bouckaert et al., 2012) and argue that lexical borrowing rates as high as around 30% (Greenhill, Currie, & Gray, 2009) will not affect the structure of such trees to any major extent. The implication here is that borrowing is mainly a nuisance, the interesting information being in the tree. At the other end of the spectrum are scholars who question that the tree model is at all useful. Some prefer a looser version of the model, with weak ‘linkages’ of intersecting rather than nested subgroups (Ross, 1988, 1997), whereas others prefer no tree at all, but instead isoglosses along the lines of traditional dialectology (François, 2015). The majority of historical linguists probably take an intermediate position, but one leaning more towards the tree model. Thus, most commonly, the tree model is used as the basis of a classification, but conflicting signals are not ignored, instead they are identified and explained to the greatest possible extent. The following quote, in which the German term Stammbaum refers to the tree model, provides a good argument for this approach.

… the Stammbaum hypothesis is always preferable as a first hypothesis because it is falsifiable; it is easy to see where it doesn’t work and easy to formulate alternatives (either alternative Stammbäume or networks). It is much easier to fit recalcitrant data into a network model; for exactly that reason a hypothesis of non-treelike diversification is less useful and should be preferred only when reasonable alternatives have proved untenable.

(Ringe & Eska, 2013, p. 263)

Distinguishing shared innovations from the three confounds, shared retentions, independent innovations, and diffusion, is usually done on a case-by-case basis, weighing the pros and cons for regarding characters (pieces of comparative data) as shared innovations or as a particular kind of confound. Not much methodological improvement has been made to this procedure, but a couple of contributions are worth mentioning. First, Nakhleh, Ringe, and Warnow (2005) (building on a series of previous papers), work with a rich set of Indo-European data and describe a procedure mixing algorithmic and linguistic thinking for finding a tree which is maximally compatible with the characters. They allow for positing ‘contact edges’—links between branches that represent borrowing events—but strive to minimize the number of such contact edges. Secondly, Brown, Holman, and Wichmann (2013) provide statistics on how often different segments enter into regular correspondences across languages. This information can be used to gauge how probable it is that a certain sound change could have happened independently in two different branches of a family.

Although the comparative method is usually regarded as the ‘gold standard’ for language classification, it has some inherent limitations that are not easily overcome. Since it relies on shared innovations for subgrouping, a dilemma arises when most languages of a family have successfully been subgrouped by that criterion, while some residual languages, even if they may look similar, defy classification because they do not evince shared innovations. An example can be drawn from within the family of Mayan languages in Mesoamerica. Although some issues of high-order subgrouping are debated, the following subgroups are uncontroversial: Huastecan, Yucatecan, Ch’olan-Tzeltalan, and Eastern Mayan. In addition, all classifications include some version of a Q’anjob’alan subgroup, although it is generally recognized (Campbell, 1977, pp. 100–101) that although these languages are quite similar, there are in fact no known shared innovations defining such a subgroup. The group includes at least Q’anjob’al, Akatek, and Jakaltek, with some other possible members whose status is debated. To be absolutely faithful to the comparative method one would have to attach each of these 3+ languages to the next level in the classification, which happens to be the very root of the family, i.e., the proto-Mayan node (some Mayanists operate with an intermediate Western Mayan subgroup, but, as it happens, this hypothetical subgroup is also not defined by shared innovations). Nevertheless, all modern classifications of Mayan languages recognize a Q’anjob’alan subgroup.

A small, ill-defined subgroup such as Q’anjob’alan may not seem like an alarming problem, but this sort of problem gets compounded when it recurs repeatedly in a larger family. This is the case for Austronesian, a family of more than one thousand languages and the world’s largest in terms of prehistorical geographic extension. As discussed by Kikusawa (2015), several of the subgroups that recur in current classifications are ill-defined in the same sense as Q’anjob’alan—only here we are dealing with hundreds of languages. As a rule of thumb, the better-defined subgroups are the ones whose languages are spoken by populations moving towards the east, whereas the western members of a binarily splitting subgroup tend to be less well defined.

The lack of shared innovations is also what causes many of the world’s language families to have a profusion of coordinate, highest-order subgroups (star-shaped phylogenies). For instance, the eight Indo-European subgroups that are still being represented by spoken languages (Albanian, Armenian, Balto-Slavic, Celtic, Germanic, Greek, Indo-Iranian, and Italic) are standardly regarded as connecting to single node either towards or at the root of the Indo-European tree, and even more branches would be needed to account for various languages only documented through ancient inscriptions. The situation is similar for Austronesian, Sino-Tibetan, Austroasiatic, Uralic, and several other families, with the record (to judge by Hammarström, Forkel, Haspelmath, & Bank, 2016) being held by the Australian family of Pama-Nyungan languages, which has 21 branches connected to the root. Although some historical linguists would see this situation as normal and expected, it is only expected because the traces of shared innovations apparently tend to get lost over time.

The idea of shared innovations as a classification criterion assumes a scenario in which a language A will undergo some changes and then split up into daughter languages in which the changes that A underwent can be recognized. If, however, the populations speaking A are differentiated geographically before A has had time to undergone some distinct changes, then the daughters cannot be recognized as direct descendants of A. It may also happen that the daughters undergo subsequent changes masking the signal inherited from A. Either way, a ‘false negative’ is produced, that is, absence of evidence for a subgroup which in fact exists. It also happens that we get ‘false positives’: presence of a shared innovation which, in fact, was an innovation that spread after a group of languages had started to differentiate. When such an areally diffused innovation affects languages belonging to a subset of languages of different subgroups its areal nature can be recognized, but when the innovation affects just the languages in a particular subgroup it is impossible to recognize its areal nature in the absence of written documentation for the diffusion of the innovation. Such cases, which might be called post hoc shared innovations, may be quite common, but since written documentation is necessary for identifying them they are rarely recognized. An example of a post hoc shared innovation is the suffix ‑wan (or ‑wän), marking the completive aspect of a special kind of intransitive verbs—so-called ‘positionals’—in three out of the four currently spoken languages belonging to the Ch’olan subgroup of Mayan languages. In the mid-eighties, before the Classic Ch’olan hieroglyphic inscriptions became sufficiently deciphered, it was thought that *‑wan should be reconstructed for proto-Ch’olan (Kaufman & Norman, 1984, p. 107), but once more progress had been made in the decipherment of the inscriptions it became clear that ‑wan is in fact an innovation originating in the western Ch’olan dialectal region which spread towards the east, replacing another suffix with the same function (Hruby & Child, 2004). Other examples are the shared Ch’olan loss of distinctive vowel length, which turned out to be post hoc, initially happening in the eastern Ch’olan region (Lacadena & Wichmann, 2002), and the ejective p’, which is a quite late innovation (Wichmann, 2006) that had nevertheless also been reconstructed for proto-Ch’olan based on the modern attestations of the languages of this group (Kaufman & Norman, 1984, p. 87).

Garrett (1999) cites Indo-European examples of post hoc shared innovations similar to the ones from Ch’olan, showing that many of the innovations that had been taken to be diagnostic of subgroups such as Celtic, Italic, and Greek are in fact not shared by all early dialects of the ancestral languages. If all innovations shared among a group of languages are post hoc, there is no longer any evidence for the existence of a single proto-language, and one would have to attach the branches leading to each language to a higher node.

In order to improve the understanding of the real-world counterparts of the nodes in a linguistic phylogeny, it is necessary to carry out more studies of situations where the availability of early, written documentation can give some insights. While a node may sometimes be an abstraction whose counterpart is really a dialect chain and at other times may correspond to a clean separation, the former situation is probably more common.

In summary, when interpreting family trees it should be kept in mind that some absent nodes are likely to be missing simply because of the absence of evidence, while some present nodes are extraneous, being due to post hoc shared innovations. When additionally it is remembered that a family tree leaves no place for lateral transfer (borrowing) it becomes clear that such a tree cannot squarely be equated with a population history as assumed by many scholars from Schleicher (1853) to Bouckaert et al. (2012), even if it serves as a convenient summary of how a group of languages has evolved.

2.2. Lexicostatistics

There was no serious alternative to the comparative method until Morris Swadesh developed lexicostatistics in the 1950’s. Although often treated as a single method, it contains several components, some of which are carried over to modern computational methods, while others have been modified or discarded. Using data from the Salishan family of languages of North America, Swadesh (1950) (i) defined a standard list of concepts, (ii) determined whether the corresponding words were cognate or not, (iii) computed the percentage of shared cognates for each pair of languages, and (iv) produced a graphical representation of language relations based on the resulting similarity matrix. He also (v) translated the amount of similarity into time units, hypothesizing that the less similar two languages are, the greater the time depth must be down to their common ancestor. Component (v) was the first step towards the development of the method of glottochronology. This is not an integral part of lexicostatistics. Since the assignment of dates to ancestral languages is merely an addition to the classification, not something that contributes to the classification as such, glottochronology will not be discussed further here.

Component (i), the standard list of concepts, was developed further into a 200-item list in Swadesh (1952) and later on (Swadesh, 1955) into one of 100 items, which is the list that is usually thought of as ‘the Swadesh list’ (although it should be noted that there are actually slightly different versions of the 100-item Swadesh list). The use of a standard set of concepts as the basis for a language classification has had an immense impact on the practice of historical linguistics, and many different lists have been devised. The Concepticon resource (List, Cysouw, & Forkel, 2016) contains a collection of 161 concept lists. Most of these concept lists were devised by historical linguists working on languages in different parts of the world.

Component (ii), cognate identification, is inherited from the comparative method, although practitioners of lexicostatistics have often relied on impressionistic criteria rather than basing themselves on the comparative method’s criterion of regular sound correspondences when judging words to be cognate. The component has also been carried over to some modern phylogenetic methods.

Component (iii), computing a similarity score, has been carried over to modern distance-based phylogenetic methods. Such similarity scores need not, however, be based on cognate counts, but can also be an aggregate word similarity (Wichmann, Holman, Bakker, & Brown, 2010; Jäger, 2013).

Component (iv), the particular graphical representation used by Swadesh, where relations are displayed as an arrangement of boxes representing languages, does not seem to have been used by anyone but Swadesh himself. Other practitioners of lexicostatistics have drawn trees, using various cut-off points of similarity for grouping languages together. More recently, a rich set of computational methods has become available for inferring phylogenies. Since the first applications of such methods to linguistic phylogenetics (including Gray & Jordan, 2000), linguists have explored and applied many types of methods, and each year sees more and more publications of such computationally inferred trees.

When viewed as a set of components rather than a fixed package, it becomes clearer how much current quantitative historical linguistics owes to lexicostatistics even if hardly anyone practices lexicostatistics in the same way that classical lexicostatistics was practiced in the second half of the 20th century.

There has been a certain divide between methodological work criticizing or improving lexicostatistics and work devoted to language classification applying something close to the original method and ignoring both criticism and the suggested improvements. The canonical description of classical lexicostatistics is Gudschinsky (1956), whose recipe by and large seems to be followed by most practitioners, such as by many fieldworkers in the Pacific region struggling to come to grips with the great extent of linguistic diversity in this part of the world. The aspect of lexicostatistics which has stirred most controversy and criticism is the underlying assumption that the rate of lexical change is (relatively) constant (Bergsland & Vogt, 1962; Blust, 2000). Modern phylogenetic methods, however, do not all rely on a constant rate of change for inferring phylogenies, so the critique has ceased to hold the same relevance as it once did (Lohr, 2000). As regards suggested improvements, these have been concerned with issues such as the list of concepts used, how to deal with partial cognacy, whether to include loanwords, which algorithms to apply when inferring trees, and so on (see Embleton, 2000, and references therein).

If classical lexicostatistics has largely fallen out of use it is not due to revisions to its methods or to the success of its opponents, but rather to the arrival of phylogenetic methods from biology which are able to infer trees from the full information on individual cognacy relations, avoiding the loss of information implied by computing an aggregate percentage of similarities. These computational methods are briefly described in the next section.

2.3. Character-Based Computational Linguistic Phylogenetic

A character can be any piece of cross-linguistically comparable trait or feature whose value or state can be discretely encoded. Commonly classes of cognates that refer to the same concept are used. For instance, among Indo-European words for the concept water, one would put together Ancient Greek hýdɔr, Irish ˈiʃkʲə, Russian vɔ’da, and other related items in one group, Latin ˈakʷa in a second, Sanskrit and Avestan āp in a third, and so on. Synonyms may require the introduction of an additional character. For instance, Sanskrit has another word for water, vār, which has cognates in Tocharian A (wär) and B (war). Thus, in this example there would be four cognate classes for water, and for each a language would score either a 1 for ‘presence’ or a 0 for ‘absence’. Characters need not be lexical. For instance, using typological features a word order character could have the possible states SOV, SVO, VSO, and so on, and using phonological innovations different changes would be encoded as having or not having taken place at some stage in the evolution of a language.

Once a matrix with characters and their states in a number of languages (or taxa, to continue the use of more general terminology) has been produced, it can be fed into any one of a number of character-based phylogenetic algorithms using various freely available software packages. An excellent overview of basic phylogenetic notions and different methods is given in Nichols and Warnow (2008), and Dunn (2015) is also highly useful, especially for its exposition of the thinking behind Bayesian methods. Felsenstein (2004) and Lemey, Salemi, and Vandamme (2009) are general handbooks on all aspects of phylogenetics, not written particularly for linguists, but useful for gaining a deeper understanding of this field. Here I will restrict myself to giving a candid and personal view of what the current challenges are for computational linguistic phylogenetics (henceforth CLP).

To be a bit blunt, much work in CLP has neglected the proverb that ‘you must learn to crawl before you can walk.’ The field was inaugurated with the publication of family trees making empirical claims about population histories (Gray & Jordan, 2000; Gray & Atkinson, 2003) without even providing the data on which the trees were based. Although sources are indicated, it would not be possible to reconstruct the data sets used based on the references given. For instance, regarding the language data most crucial to their analysis, the data from Tocharian A and B and Hittitte, Gray and Atkinson (2003, p. 438) simply inform us that they—two biologists—added those themselves. As regards methods, there are references to software and settings, but here, as well as in later literature, there is a tendency to use the latest off-the-shelf method without arguing why one method is better than another. For the field of CLP to mature it is necessary to adhere to good scientific practice of ensuring full replicability of a study. In addition to more replicable studies, method testing is a desideratum. But to even start testing the performance of different CLP methods, gold standards of comparison are needed. These can be of two kinds, either synthetic (simulated) phylogenies along with synthetic data that have been made to evolve along its branches, or empirical phylogenies only containing nodes for which there is full evidence. As for the latter, Glottolog (Hammarström, Forkel, Haspelmath, & Bank, 2016) comes close. For comparing an inferred tree with the gold standard, the former should not be punished if it is more resolved than the latter, something which happens in standardly available tree comparison software. The best method is the Generalized Quartet Distance of Pompei, Loreto, and Tria (2011), but unfortunately this is not yet implemented in publicly available software.

One essential issue that needs to be investigated is how the results of different CLP methods compare with real phylogenies. Through simulations it can further be tested how different parameters affect the performance of different CLP methods, including parameters such as variable rates of change, different ways of encoding data (multistate characters vs. binary recodings), the amount of data (e.g., length of word lists), the number of missing cells in a data matrix, the amount of lateral transfer (borrowing), and so on. The most extensive work in this area is Barbançon, Evans, Nakhleh, Ringe, and Warnow (2013). These authors compare the performance of different phylogenetic methods on a synthetic data set, finding the following hierarchy of performance: Maximum Parsimony (MP) > Bayesian Markov chain Monte Carlo (Bayesian MCMC) > Neighbor-Joining (NJ) > Unweighted Pair Grouping Method of Agglomeration (UPGMA).

UPGMA is an early and very simple method which starts out joining the two most similar taxa, then adds a third taxon which has the closest average similarity to the two first taxa, and so on until the tree is complete.

NJ is similar to UPGMA, but it starts out assuming a star-shaped phylogeny with this structure being revised as the process unfolds, and at each step of the agglomeration process the distance matrix is recalculated. Like UPGMA, it is a distance-based method.

Bayesian MCMC requires a pre-specified model of character evolution. Random trees are generated and evaluated against the data and the model. The trees with the best fit are sampled, and the output is thus not one but many possible trees—a ‘posterior sample.’ Support values for a node in the tree can be derived from the frequency with which the node occurs in the posterior sample.

MP is somewhat similar to the spirit of the comparative method in that it searches for the tree which requires a minimum number of changes of character states. Like Bayesian MCMC it is a character-based method.

In general, character-based methods are expected to perform better than distance-based ones since they make fuller use of all the information in a character-matrix. But in some situations only distances are available (see next section) and then NJ is generally preferred. If characters are available, biologists normally prefer Bayesian MCMC, and for this reason the biologists who got the CLP field going have routinely used such methods. The results of Barbançon, Evans, Nakhleh, Ringe, and Warnow (2013), however, indicate that there is room for some reconsideration. Moreover, the Bayesian methods are problematic from the practical point of view that they are extremely computationally intensive to the point where larger data sets become unwieldy without access to supercomputing facilities, and the fact that they are model-based, although theoretically an advantage, raises questions about what sort of assumptions are appropriate for linguistic data and make the results highly dependent on particular decisions taken by the analyst (Chang, Cathcart, Hall, & Garrett, 2015).

2.4. Distance-Based Computational Linguistic Phylogenetics

Character-based methods typically rely on cognate judgments, which in turn require the input of a comparative linguist. For some language families this input may not be available. More crucially, when testing for possible genealogical relations among languages which have not been demonstrated to be related, cognates cannot be the starting point since cognates by definition are related words and therefore require the languages compared to be related. In this situation a measure of distances among languages may be a useful heuristic towards identifying genealogical relations. Finally, for cross-disciplinary work where it is useful to correlate differences among languages with other factors, such as economic relations, geo-political relations, patterns of migrations, cultural exchange, genetic distances, and so forth, a distance-measure is also useful.

A project which serves all the above-mentioned purposes is the Automated Similarity Judgment Program or ASJP, which represents an open, collaborative endeavor to develop a database of word list, ideally for all the world’s languages—it currently covers about two thirds of them—and associated methods and software for producing distance measures ( In much of the ASJP work to date, distances among words in a standard 40-item list of particularly stable items on the Swadesh list (Holman et al., 2008) have been calculated using a modified version of the Levenshtein or ‘edit’ distance called LDND. The properties of the LDND are by now well studied, and several studies, including Wichmann, Holman, Bakker, and Brown (2010) and Pompei, Loreto, and Tria (2011) have been devoted to exploring its performance in classifying languages from different families. Nevertheless, more recent work has been devoted to the development of more sophisticated methods for aligning strings of symbols and measuring distances (Jäger, 2013) with interesting results for the simultaneous comparison of hundreds of languages in Eurasia (Jäger, 2015) and thousands of languages in the world at large (Jäger & Wichmann, 2016). More work is underway in this area.

In addition to Neighbor-Joining, the standard tool for producing trees based on distance-data, a useful algorithm is Neighbor-Net, which also takes distances as input and produces networks displaying boxes where there is conflicting evidence for the classification of taxa (reticulation). This is implemented in Splitstree (Huson & Bryant, 2006). Wichmann, Holman, Bakker, and Brown (2010) provides a discussion of the notion of reticulation and an investigation of its causes.

2.5. Classifications Based on Typological Features

During the first decade of the 21th century there was some hope that abstract typological features might offer an opportunity for pushing the time limit at which genealogical relations can be established (Dunn, Terrill, Reesink, Foley, & Levinson, 2005) and related methodological issues were investigated (Wichmann & Saunders, 2007). The optimism, however, was dampened by the realization that such features are highly prone to diffuse, and an additional problem is the high incidence of homoplasy (independently developed identical character states) which is due to the limited number of values that individual typological features can take (Gray, Bryant, & Greenhill, 2010; Donohue, Musgrave, Whitting, & Wichmann, 2011). Still, typological features carry information about language prehistory and they will have a role to play in future historical linguistics, which is one of the reasons why ongoing work is devoted to the development of ever-larger databases of such features (

3. External Language Comparison

The previous section mentioned how thousands of languages may be compared using string similarity measures for the purpose of discovering hints at new long-distance genealogical relations. This approach might be called controlled mass comparison and can be contrasted with uncontrolled mass comparison. In the latter approach a standardized set of concepts is not used; instead random words are compared across a set of language and relations of cognation between words are proposed with the allowance for great latitude in semantic and phonological differences. The former approach is still new, but has already proven useful. For instance, it directed researchers to the possibility of a relationship between the Chitimacha language formerly spoken near New Orleans and the Totozoquean languages of Mesoamerica. Upon closer scrutiny much more evidence could be added in support of this link through the application of more traditional methods of comparative linguistics (Brown, Wichmann, & Beck, 2014). Uncontrolled mass comparison, which is used in Greenberg’s work on languages of the Americas (Greenberg, 1987) as well as in other parts of the world, has proven less productive. A third method for discovering long-distance genealogical relations is the comparison of reconstructed proto-languages. In order to facilitate such work, Georgij Starostin and collaborators have developed the online collection of etymological dictionaries called Tower of Babel ( However, reconstructed lexica are currently available for less than one hundred of the world’s language families, so there is a long way to go before a large-scale, systematic comparison of reconstructed proto-languages is feasible.

If the comparison pauses or stops at an initial stage there should minimally be some test of the probability that the similarities encountered could be due to chance. An interesting test was proposed by Oswalt (1970). Oswalt’s idea was to compare the number of cognates between two Swadesh-type lists with the number of apparent cognates in scrambled versions of the same lists. If significantly more cognates are found in the non-scrambled comparisons than in the scrambled ones, then this would give some indication of a genealogical link (provided that the putative cognates are not loanwords). The same idea is embodied in the LDND lexical distance measure of Wichmann, Holman, Bakker, and Brown (2010) (used within the ASJP project since 2008). The LDND is the edit distance for words referring to the same concept normalized by the edit distance for words not referring to the same concept (cf. also Dunn & Terrill, 2012, for another modern computational version and application of Oswalt’s ‘shift test’).

Significance tests along the lines of Oswalt (1970) may bring further support to an initial suspicion or a genealogical link, but any initial comparison of languages for the purpose of establishing that they are distantly related should be succeeded by a subsequent stage where all available data are looked at exhaustively. Sound correspondences should be established and lexical reconstructions made based on high-quality cognate sets (cf. Brown, Wichmann, & Beck, 2014, for a proposal of how to evaluate the quality of a cognate set). Ideally, grammatical reconstructions should be carried out as well, but this may sometimes not be possible. In some cases the languages compared simply have too little morphology, but having extensive, productive morphology is also no guarantee for finding cognate material, since such morphology is often due to relatively recently grammaticalized elements.

Proposals for long-distance genealogical relations have been easy prey for critics since many have rested on poor evidence and some have been downright ludicrous. Random similarities have routinely been presented as evidence for such relations, and critics have routinely shot down such proposals, but there has also been a tendency to routinely criticize any proposal for a long-distance relation, including the better supported proposals. It looks like there is now a growing awareness that proof of relationships cannot be a matter of either–or in a situation where the languages are so remotely related as to have lost nearly all signals of their shared ancestry—rather, proposals need to be evaluated on some scale of convincingness. There are the kinds of proposal that convince everyone because they shed new light on the synchronic states of the languages compared and are productive in the sense that more and more inherited material is found as more comparisons are made, but then there are all the other proposals which fail to convince everyone. A currently active research area is how best to evaluate such proposals in an objective and systematic fashion.

4. Outlook

The two major periods of innovation with regards to methods of language classification in the past have been the latter half of the 19th century and the 1950’s. We are fortunate to be in the middle of a third major period of innovation, which began around the year 2000. What has triggered the recent developments is the availability of computational power. This has made it possible to carry the innovations of the 1950’s into new frameworks. Currently, even more intensive efforts are being put into also bringing the innovations of the 19th century into the computer age. Methods for automating the process of cognate identification have been proposed, and this is a lively research area at the moment (Inkpen, Frunza, & Kondrak, 2005; Delmestri & Cristianini, 2010; Hauer & Kondrak, 2011; List, 2012). Hope is forthcoming that the next two steps in the comparative method, those of detecting regular sound changes and reconstructing proto-forms, can also be automated (see Hruschka et al., 2015; Bouchard-Côté, Hall, Griffiths, & Klein, 2013, respectively). All methods involved are still in an experimental state and generally difficult to grasp for the average historical linguist and/or impossible to apply in practical terms due to a lack of software implementations, but there is little doubt that they will soon have a real impact on how the field of historical comparative linguistics is practiced.

Ironically, a major obstacle for further developments at the moment is the lack of work on what should be the simplest task of all, which is to prepare dictionaries and morphological descriptions in standard, computer-readable formats as input for the computational methods. Advanced machinery is being constructed at the moment, but until some radical measures are taken with regard to data preparation the machines will not have anything to do. There is an important role to play here for consortia of data-oriented linguists and, in general, for collaboration across the board of researchers with an interest in historical linguistics, all the way from the fieldworker to the computer scientist.

Further reading

Campbell, L., & Poser, W. J. (2008). Language classification: History and method. Cambridge, U.K.: Cambridge University Press.Find this resource:

    Dunn, M. (2015). Language phylogenies. In C. Bowern & B. Evans (Eds.), The Routledge handbook of historical linguistics (pp. 190–211). London: Routledge.Find this resource:

      Embleton, S. (2000). Lexicostatistics/glottochronology: From Swadesh to Sankoff to Starostin to future horizons. In C. Renfrew, A. McMahon, & L. Trask (Eds.), Time depth in historical linguistics (pp. 143–165). Cambridge, U.K.: The McDonald Institute for Archaeological Research.Find this resource:

        Felsenstein, J. (2004). Inferring phylogenies. Sunderland, MA: Sinauer Associates.Find this resource:

          François, A. (2015). Trees, waves and linkages: Models of language diversification. In C. Claire & B. Evans (Eds.), The Routledge handbook of historical linguistics (pp. 161–189). London: Routledge.Find this resource:

            Gray, R. D., Bryant, D., & Greenhill, S. J. (2010). On the shape and fabric of human history. Philosophical Transactions of the Royal Society B, 365, 3923–3933. doi:10.1098/rstb.2010.0162Find this resource:

              Holman, E. W., Wichmann, S., Brown, C. H., Velupillai, V., Müller, A., & Bakker, D. (2008). Explorations in automated language classification. Folia Linguistica, 42(2), 331–354.Find this resource:

                Hymes, D. (1960). Lexicostatistics so far. Current Anthropology, 1(1), 3–44.Find this resource:

                  Kessler, B. (2008). The mathematical assessment of long-range linguistic relationships. Language and Linguistics Compass, 2(5), 821–839.Find this resource:

                    Nichols, J. (1996). The comparative method as heuristic. In M. Durie & M. Ross (Eds.), The comparative method reviewed: Regularity and irregularity in language change (pp. 39–71). Oxford: Oxford University Press.Find this resource:

                      Nichols, J., & Warnow, T. (2008). Tutorial on computational linguistic phylogeny. Language and Linguistics Compass, 2(5), 760–820.Find this resource:

                        Paradis, E. (2012). Analysis of phylogenetics and evolution with R (2d ed.). New York: Springer.Find this resource:

                          Ringe, D., & Eska, J. F. (2013). Historical linguistics: Toward a twenty-first century reintegration. Cambridge: Cambridge University Press.Find this resource:

                            Ross, M. (1997). Social networks and kinds of speech community events. In R. M. Blench & M. Spriggs (Eds.), Archaeology and language I: Theoretical and methodological orientations (pp. 209–261). London: Routledge.Find this resource:

                              Trask, R. L. (1996). Historical linguistics. London: Oxford University Press.Find this resource:


                                Barbançon, F., Evans, S. N., Nakhleh, L., Ringe, D., & Warnow, T. (2013). An experimental study comparing linguistic phylogenetic reconstruction methods. Diachronica, 30(2), 143–170. doi:10.1075/dia.30.2.01barFind this resource:

                                  Bergsland, K., & Vogt, H. (1962). On the validity of glottochronology. Current Anthropology, 3(2), 115–153.Find this resource:

                                    Blust, R. (2000). Why lexicostatistics doesn’t work: The “universal constant” hypothesis and the Austronesian languages. In C. Renfrew, A. McMahon, & L. Trask (Eds.), Time depth in historical linguistics (Vol. 2, pp. 311–331). Cambridge, U.K.: The McDonald Institute for Archaeological Research.Find this resource:

                                      Bouchard-Côté, A., Hall, D., Griffiths, T. L., & Klein, D. (2013). Automated reconstruction of ancient languages using probabilistic models of sound change. Proceedings of the national academy of sciences of the U.S.A., 110, 4224–4229.Find this resource:

                                        Bouckaert, R., Lemey, P., Dunn, M., Greenhill, S. J., Alekseyenko, A. V., Drummond, A. J., Gray, R. D., et al. (2012). Mapping the origins and expansion of the Indo-European language family. Science, 337, 957–960. doi:10.1126/science.1219669Find this resource:

                                          Brown, C. H., Holman, E. W., & Wichmann, S. (2013). Sound correspondences in the world’s languages. Language, 89(1), 4–29.Find this resource:

                                            Brown, C. H., Wichmann, S., & Beck, D. (2014). Chitimacha: A Mesoamerican language in the Lower Mississippi Valley. International Journal of American Linguistics, 80(4), 425–474.Find this resource:

                                              Brugmann, K. (1884). Zur Frage nach den Verwandtschaftsverhältnissen der Indo-Germanischen Sprachen. Internationale Zeitschrift für allgemeine Sprachwissenschaft, 1, 226–256.Find this resource:

                                                Campbell, L. (1977). Quichean linguistic prehistory. University of California Publications in Linguistics, 81. Berkeley: University of California Press.Find this resource:

                                                  Chang, W., Cathcart, C., Hall, D., & Garrett, A. (2015). Ancestry-constrained phylogenetic analysis supports the Indo-European steppe hypothesis. Language, 91(1), 194–244.Find this resource:

                                                    Delmestri, A., & Cristianini, N. (2010). String similarity measures and PAM-like matrices for cognate identification. Bucharest Working Papers in Linguistics, 12(2), 71–82.Find this resource:

                                                      Donohue, M., Musgrave, S., Whitting, B., & Wichmann, S. (2011). Typological feature analysis models linguistic geography. Language, 87(2), 369–383.Find this resource:

                                                        Dunn, M. (2015). Language phylogenies. In C. Bowern & B. Evans (Eds.), The Routledge handbook of historical linguistics (pp. 190–211). London: Routledge.Find this resource:

                                                          Dunn, M., & A. Terrill. (2012). Assessing the lexical evidence for a Central Solomons Papuan family using the Oswalt Monte Carlo Test. Diachronica, 29(1), 1–27. doi:10.1075/dia.29.1.01dunFind this resource:

                                                            Dunn, M., Terrill, A., Reesink, G., Foley, R. A., & Levinson, S. C. (2005). Structural phylogenetics and the reconstruction of ancient language history. Science, 309, 2072–2075. doi:10.1126/science.1114615Find this resource:

                                                              Embleton, S. M. (1986). Statistics in historical linguistics. Bochum, Germany: Brockmeyer.Find this resource:

                                                                Felsenstein, J. (2004). Inferring phylogenies. Sunderland, MA: Sinauer Associates.Find this resource:

                                                                  François, A. (2015). Trees, waves and linkages: Models of language diversification. In C. Bowern & B. Evans (Eds.), The Routledge handbook of historical linguistics (pp. 161–189). London: Routledge.Find this resource:

                                                                    Garrett, A. (1999). A new model of Indo-European subgrouping and dispersal. In S. S. Chang, L. Liaw, & J. Ruppenhofer (Eds.), Proceedings of the twenty-fifth annual meeting of the Berkeley Linguistics Society, February 12–15 (pp. 146–156). Berkeley: Berkeley Linguistics Society.Find this resource:

                                                                      Gray, R. D., & Atkinson, Q. D. (2003). Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature, 426, 435–439. doi:10.1038/nature02029Find this resource:

                                                                        Gray, R. D., Bryant, D., & Greenhill, S. J. (2010). On the shape and fabric of human history. Philosophical Transactions of the Royal Society B, 365, 3923–3933. doi:10.1098/rstb.2010.0162Find this resource:

                                                                          Gray, R. D., Drummond, A. J., & Greenhill, S. J. (2009). Language phylogenies reveal expansion pulses and pauses in pacific settlement. Science, 323, 479–483. doi:10.1126/science.1166858Find this resource:

                                                                            Gray, R. D., & Jordan, F. M. (2000). Language trees support the express-train sequence of Austronesian expansion. Nature, 405(6790), 1052–1055.Find this resource:

                                                                              Greenberg, J. H. (1987). Language in the Americas. Stanford, CA: Stanford University Press.Find this resource:

                                                                                Greenhill, S. J., Currie, T. E., & Gray, R. D. (2009). Does horizontal transmission invalidate cultural phylogenies? Proceedings of the Royal Society, B 276, 2299–2306.Find this resource:

                                                                                  Gudschinsky, S. (1956). The ABC’s of lexicostatistics. Word, 12, 175–210.Find this resource:

                                                                                    Hammarström, H., Forkel, R., Haspelmath, M., & Bank, S. (2016). Glottolog 2.7. Jena: Max Planck Institute for the Science of Human History. Available online at Accessed June 6, 2016.

                                                                                    Hauer, B., & Kondrak, G. (2011). Clustering semantically equivalent words into cognate sets in multilingual lists. In 5th international joint conference on natural language processing, IJCNLP 2011, 865–873.Find this resource:

                                                                                      Holman, E. W., Wichmann, S., Brown, C. H., Velupillai, V., Müller, A., & Bakker, D. 2008. Explorations in automated language comparison. Folia Linguistica, 42, 331–354.Find this resource:

                                                                                        Hruby, Z. X., & Child, M. B. (2004). Chontal linguistic influence in Ancient Maya writing: Intransitive positional verbal affixation. In S. Wichmann (Ed.), The Linguistics of Maya writing (pp. 13–26). Salt Lake City: University of Utah Press.Find this resource:

                                                                                          Hruschka, D. J., Branford, S., Smith, E. D., Wilkins, J., Meade, A., Pagel, M., & Bhattacharya, T. (2015). Detecting regular sound changes in linguistics as events of concerted evolution. Current Biology, 25(1), 1–9. doi:10.1016/j.cub.2014.10.064Find this resource:

                                                                                            Huson, D. H., & Bryant, D. (2006). Application of phylogenetic networks in evolutionary studies. Molecular Biology and Evolution, 23, 254–267. doi:10.1093/molbev/msj030Find this resource:

                                                                                              Inkpen, D., Frunza, O., & Kondrak, G. (2005). Automatic identification of cognates and false friends in French and English. Paper presented at the second international conference on recent advances in natural language processing (RANLP 2005). Borovets, Bulgaria, September 21–23, 2005.Find this resource:

                                                                                                Jäger, G. (2013). Phylogenetic inference from word lists using weighted alignment with empirically determined weights. Language Dynamics and Change, 3, 245–291. doi:10.1163/22105832-13030204Find this resource:

                                                                                                  Jäger, G. (2015). Support for linguistic macrofamilies from weighted sequence alignment. Proceedings of the national academy of science of the U.S.A., 112(41), 12752–12757. doi:10.1073/pnas.1500331112Find this resource:

                                                                                                    Jäger, G., & Wichmann, S. (2016). Inferring the world tree of languages from word lists. In S. G. Roberts, C. Cuskley, L. McCrohon, L. Barceló-Coblijn, O. Feher, & T. Verhoef, (Eds.), The evolution of language: Proceedings of the 11th international conference (EVOLANG11).Find this resource:

                                                                                                      Kaufman, T., & Norman, W. M. (1984). An outline of proto-Cholan phonology, morphology, and vocabulary. In J. S. Justeson & L. Campbell (Eds.), Phoneticism in Mayan hieroglyphic writing. Institute of Mesoamerican Studies, State University of New York at Albany, Publication No. 9, 77–166. Albany: Institute for Mesoamerican Studies, State University of New York at Albany.Find this resource:

                                                                                                        Kikusawa, R. (2015). The Austronesian language family. In C. Bowern & B. Evans (Eds.), The Routledge handbook of historical linguistics (pp. 190–211). London: Routledge.Find this resource:

                                                                                                          Lacadena, A., & Wichmann, S. (2002). The distribution of Lowland Maya languages in the Classic period. In V. Tiesler, R. Cobos, & M. G. Robertson (Eds.), La organización social entre los mayas. Memoria de la Tercera Mesa Redonda de Palenque (Vol. II, pp. 275–314). México City: Instituto Nacional de Antropología e Historia & Universidad Autónoma de Yucatán.Find this resource:

                                                                                                            Lemey, P., Salemi, M., & Vandamme, A. (Eds.). (2009). The phylogenetic handbook. Cambridge: Cambridge University Press.Find this resource:

                                                                                                              Leskien, A. (1876). Die Declination im Slawisch-Litauischen und Germanischen. Leipzig: Hirzel.Find this resource:

                                                                                                                List, J.-M. (2012). LexStat: Automatic detection of cognates in multilingual wordlists. Proceedings of the EACL 2012 Joint Workshop of LINGVIS & UNCLH, 117–125.Find this resource:

                                                                                                                  List, J.-M., Cysouw, M., & Forkel, F. (Eds.). (2016). Concepticon. Jena: Max Planck Institute for the Science of Human History. Available online at Accessed June 10, 2016. doi:10.5281/zenodo.51259

                                                                                                                  Lohr, M. (2000). New approaches to lexicostatistics and glottochronology. In C. Renfrew, A. McMahon, & L. Trask (Eds.), Time depth in historical linguistics (pp. 209–222). Cambridge, U.K.: The McDonald Institute for Archaeological Research.Find this resource:

                                                                                                                    Nakhleh, L., Ringe, D., & Warnow, T. (2005). Perfect phylogenetic networks: a new methodology for reconstructing the evolutionary history of natural languages. Language, 81(2), 382–420.Find this resource:

                                                                                                                      Nichols, J., & Warnow, T. (2008). Tutorial on computational linguistic phylogeny. Language and Linguistics Compass, 2(5), 760–820. doi:10.1111/j.1749-818x.2008.00082.xFind this resource:

                                                                                                                        Oswalt, R. L. (1970). The detection of remote linguistic relationships. Computer Studies in the Humanities and Verbal Behavior, 3(3), 117–129.Find this resource:

                                                                                                                          Pompei, S., Loreto, V., & Tria, F. (2011). On the accuracy of language trees. PLoS ONE, 6(6), e20109. doi:10.1371/journal.pone.0020109Find this resource:

                                                                                                                            Ringe, D., & Eska, J. F. (2013). Historical linguistics: Toward a twenty-first century reintegration. Cambridge: Cambridge University Press.Find this resource:

                                                                                                                              Ross, M. D. (1988). Proto Oceanic and the Austronesian languages of Western Melanesia. Canberra, Australia: Pacific Linguistics.Find this resource:

                                                                                                                                Ross, M. D. (1997). Social networks and kinds of speech community events. In R. M. Blench, & M. Spriggs (Eds.), Archaeology and language I: Theoretical and methodological orientations (pp. 209–261). London: Routledge.Find this resource:

                                                                                                                                  Schleicher, A. (1853). Die ersten Spaltungen des indogermanischen Urvolkes. Allgemeine Monatsschrift fuer Sprachwissenschaft und Literatur (August), 786–787.Find this resource:

                                                                                                                                    Schmidt, J. (1872). Die Verwandtschaftsverhältnisse der indogermanischen Sprachen. Weimar, Germany: Böhlau.Find this resource:

                                                                                                                                      Swadesh, M. (1950). Salish internal relationships. International Journal of American Linguistics, 16(4), 157–167.Find this resource:

                                                                                                                                        Swadesh, M. (1952). Lexico-statistic dating of prehistoric ethnic contacts: With special reference to North American Indians and Eskimos. Proceedings of the American Philosophical Society, 96(4), 453–463.Find this resource:

                                                                                                                                          Swadesh, M. (1955). Towards greater accuracy in lexicostatistic dating. International Journal of American Linguistics, 21(2), 121–137.Find this resource:

                                                                                                                                            Wichmann, S. (2006). A new look at linguistic interaction in the lowlands as a background for the study of Maya codices. In R. Valencia Rivera & G. Le Fort (Eds.), Sacred books, sacred languages: Two thousand years of ritual and religious Maya literature. Proceedings of the 8th European Maya Conference, Madrid, November 25–30, 2003. (Acta Mesoamericana, 18) (pp. 45–64). Markt Schwaben, Germany: Verlag Anton Saurwein.Find this resource:

                                                                                                                                              Wichmann, S., Holman, E. W., Bakker, D., & Brown, C. H. (2010). Evaluating linguistic distance measures. Physica A, 389, 3632–3639. doi:10.1016/j.physa.2010.05.011Find this resource:

                                                                                                                                                Wichmann, S., Holman, E. W., Rama, T., & Walker, R. S. (2011). Correlates of reticulation in linguistic phylogenies. Language Dynamics and Change, 1, 205–240. doi:10.1163/221058212X648072Find this resource:

                                                                                                                                                  Wichmann, S., & Saunders, A. (2007). How to use typological databases in historical linguistic research. Diachronica, 24(2), 373–404.Find this resource: