The Oxford Research Encyclopedia of Linguistics will be available via subscription on April 26. Visit About to learn more, meet the editorial board, or recommend to your librarian.

Show Summary Details

Page of

PRINTED FROM the OXFORD RESEARCH ENCYCLOPEDIA, LINGUISTICS ( (c) Oxford University Press USA, 2016. All Rights Reserved. Personal use only; commercial use is strictly prohibited. Please see applicable Privacy Policy and Legal Notice (for details see Privacy Policy).

date: 22 March 2018

Computational Approaches to Morphology

Summary and Keywords

Computational psycholinguistics has a long history of investigation and modeling of morphological phenomena. Several computational models have been developed to deal with the processing and production of morphologically complex forms and with the relation between linguistic morphology and psychological word representations. Historically, most of this work has focused on modeling the production of inflected word forms, leading to the development of models based on connectionist principles and other data-driven models such as Memory-Based Language Processing (MBLP), Analogical Modeling of Language (AM), and Minimal Generalization Learning (MGL). In the context of inflectional morphology, these computational approaches have played an important role in the debate between single and dual mechanism theories of cognition. Taking a different angle, computational models based on distributional semantics have been proposed to account for several phenomena in morphological processing and composition. Finally, although several computational models of reading have been developed in psycholinguistics, none of them have satisfactorily addressed the recognition and reading aloud of morphologically complex forms.

Keywords: morphology, word recognition, inflection, distributional semantics, connectionism, exemplar-based, compounds, naive discriminative learning, rules, dual mechanism

Two broad categories of questions in morphology can be approached using computational techniques. In the area of computational linguistics, computational techniques can be used to address problems such as identifying a word’s morphological constituents in automatic processing of text. In the field of psycholinguistics, computational techniques can help to understand how morphology is psychologically represented. After a very brief overview of the role of morphology in computational linguistics, the majority of this article will be devoted to computational approaches to morphology in psycholinguistics.

1. Morphology in Computational Linguistics

In computational linguistics, morphological analysis primarily serves to solve practical problems. For instance, in English, compound words can be written without a space, with a dash, or with a space. Tokenization processes that consider anything delimited by spaces or punctuation as a word obviously end up considering different parts of spaced compounds as different tokens. A basic solution to this is to use a curated index of compound words. More advanced solutions consist of identifying compounds based on n‑gram statistics so that only likely compounds are retained (e.g., Su, Wu, & Chang, 1994). In automatic text processing, it is also useful to have methods to identify whether several morphological forms derive from the same lemma. A straightforward strategy for lemmatization is to use tables that list form–lemma correspondences. However, especially with highly inflected languages, the number of potential inflected forms can become so large that table-based lookup becomes impractical or impossible. In addition, lemmatization based on lookup can never be exhaustive, as new words are coined all the time and inflection of novel forms is a highly productive process. A solution consists of using stemming algorithms to remove inflectional affixes from forms to obtain a matching lemma (Porter, 2006; Paice, 1994).

2. Computational Approaches to Modeling Infection

2.1 Connectionist Models

In computational psycholinguistics, one of the important questions is how language users are able to generate novel inflected, compound, and derivational forms. Rumelhart and McClelland (1986) were the first to propose a connectionist model of inflectional morphology. During the learning phase, the model was presented with English present tense forms on its input layer and corresponding past tense forms on its output layer. Eventually, the model learned to correctly produce most past tense forms from the present tense forms, suggesting that the knowledge required for this conversion can be stored in connection weights. The model also showed a U-shaped learning curve, similar to what has been observed in acquisition: Children first produce irregular forms correctly, then overgeneralize the regular patterns to irregulars, and finally start producing the irregulars correctly again. In connectionist models, the same effect can be achieved by starting training with a small set of frequent regular and irregular forms and later expanding that set to less frequent forms. Despite its apparent success, the model was strongly criticized by Pinker and Prince (1988). Many of these shortcomings were addressed in works by Plunkett and Marchman (1991, 1993), Joanisse and Seidenberg (1999), and Cottrell and Plunkett (1994).

Around the same time that connectionist models of inflection were developed, three other proposals emerged that relied on data-driven principles to explain both regular and irregular inflection. All of these models start out with a knowledge base consisting of a list of exemplars associated with a label corresponding to the operation required to produce the inflected form. For instance, the form walk would have the label ‘+ed’ while the form sing would have the label ‘i>a’. These data-driven models share the assumption that correct inflectional forms can be produced by relying solely on a database of encoded forms. However, they differ in the way in which this is achieved.

2.2 Memory-Based Language Processing

Memory-Based Language Processing (MBLP)1 is an application and extension of the k-nearest neighbor (“k-nn”) algorithm (Fix & Hodges, 1951) to linguistic material, which was first developed by Daelemans and his colleagues at the end of the 1980s (see Daelemans & van den Bosch, 2005, for an overview). The central tenet of MBLP is that all encountered exemplars are kept in memory and that the creation of new forms is driven by generalization on the basis of similar memorized forms. In k-nn, the symbol k refers to the number of forms that are taken into account for generalization. When k=1, a generalization will be based on the single form that is least distant from the novel form, or, when there is more than one exemplar at that same distance, by the class that is shared by the majority of these exemplars. Applied to the English past tense, for instance, memory-based learning would predict that the past tense form of the novel verb to spling would be splung on the basis of similar sounding verbs such as spring–sprung, cling–clung, or swing–swung. On the other hand, the past tense form of to plip would become plipped because of its similarity to forms such as to slip–slipped, clip–clipped, or flip–flipped (Keuleers, 2008).

The earliest memory-based models of morphological processes were focused on predicting Dutch diminutive formation and the resulting models were able to predict diminutives encountered in corpora with high accuracy (Daelemans, Berck, & Gillis, 1997). MBLP researchers also started to look at psycholinguistic evidence, for instance child acquisition data or experimentally elicited inflections of novel forms. Dutch plural inflection was studied by Keuleers et al. (2007) and Keuleers and Daelemans (2007). Memory-based learning has also been applied to linking elements in Dutch compounds (Krott, Baayen, & Schreuder, 2001), Serbian instrumental inflection (Milin, Keuleers, & Filipović-Đurđević, 2011), and English past tense formation (Hahn & Nakisa, 2000; Keuleers, 2008; Nakisa & Hahn, 1996).

Simulations with memory-based learning have shown that relatively few examples are needed for correct generalization. In their methodological study on Dutch plural inflection, Keuleers and Daelemans (2007) showed that when accuracy is based on the number of attested forms that can be correctly predicted, the best value for k is usually 1, while when dealing with experimentally elicited inflection of novel forms, k=7 is usually the best value. This difference in neighborhood reflects the tension between creative use of morphology, which relies on more general patterns and the prediction of attested complex forms, which is more exception-based.

Throughout the years, there has been a focus on offering efficient, optimized, and user-friendly code base for memory-based language processing. This has led to numerous iterations of the Tilburg Memory Based Learner (TiMBL), which also has an accessible reference guide explaining all the parameters and options (Daelemans, Zavrel, van der Sloot, & van den Bosch, 2004).

2.3 Analogical Modeling

A second data-driven approach that was developed during the 1980s is Analogical Modeling (AM),2 first described in detail in Skousen (1989). While memory-based learning uses the most similar exemplars stored in memory as a basis for extrapolation, AM takes a more intricate route to determine the basis for analogy, making use of supracontexts, which are essentially abstractions over exemplars, ignoring one or more of their characteristics. A supracontext can thus be seen as a level at which exemplars can be grouped. Since supracontexts are created recursively, there exists a global supracontext that matches all exemplars in the data set. In its decision-making process, AM only considers homogeneous supracontexts, with a supracontext being defined as a context in which there is no more disagreement about the class of the exemplars it matches than in any of its subcontexts. The collection of homogeneous supracontexts is called the analogical set. For instance, if we represent an exemplar by its sequence of letters, then sip could would have supracontexts -ip, s-p, si-, -i-, s--, --p, and ---. The exemplar tip would share the supracontexts -ip, -i-, --p, and --- with sip. Since to sip and to tip both take the suffix -ed in the past tense, those supracontexts are homogeneous. When we want to determine the inflectional class of the exemplar bip, AM finds the homogeneous supracontexts and gives either the probability of the inflectional class in these contexts, or a discrete decision, reflecting the majority class. AM also has an extra parameter that can be used to vary the proportion of exemplars taken into account for a decision. This parameter is inspired by the idea of an imperfect memory and can obviously have consequences for the behavior of the model, with less frequent classes being affected more by imperfect memory.

In morphology, AM, like MBLP, has been applied mostly to inflection. An early study by Skousen (1989) covered the Finnish past tense, which is mostly regular but contains some subregularities, and showed that the existing situation in Finnish fits well with the predictions made by AM. Later applications have focused on Spanish verbal inflection (Eddington, 2009), Spanish gender assignment (Eddington, 2002b; Eddington & Lonsdale, 2007), Spanish diminutives (Eddington, 2002c), and the English past tense (Chandler, 2010; Eddington, 2004).

In a comparison of MBLP with AM on the task of German plural prediction, Daelemans, Gillis, and Durieux (1997) concluded that both algorithms not only perform the task with similar accuracy, but that the patterns of errors are also very similar. Eddington (2002a) compared AM and MBLP on the task of Spanish stress assignment and showed that while there were some minor differences, both models have about the same accuracy on predicting existing forms and that they display the same hierarchy of difficulty in assigning stress, consistent with patterns attributed to children who are learning Spanish.

AM has a canonical implementation in Perl, available from In computational terms, the algorithm is much slower than MBL because execution time increases exponentially with increasing exemplars.

2.4 Minimal Generalization Learning

Minimal Generalization Learning (MGL) is a data-driven model that, like MBLP and AM, relies on a database of exemplars (Albright & Hayes, 2003). It is conceptually different from those two approaches because the exemplars are used to build a system of rules and are not used directly in the decision process. Despite this, MGL is very similar to AM in its use of contexts to match different exemplars. In MGL, contexts are constructed by pairwise comparison of verbs which undergo the same change in inflection. MGL is called minimal generalization learning, because when exemplars with the same inflectional change are compared, the model constructs the minimal context matching both patterns. For instance, comparing the verbs spring–sprung and sting–stung leads to the context /s__ŋ/, which matches all verbs beginning with /s/ and ending in /ŋ/. By presenting each exemplar and its inflectional change to the system and comparing this to previously evaluated verbs, multiple rules are constructed indicating which change can be applied in which context. MGL deals with different changes occurring in the same context by computing a reliability for each rule. This reliability is a simple probability, computed by taking the number of exemplars in a context that have a particular change and dividing this number by the total number of items covered by the context. A problem with this approach is that all rules that cover just two exemplars have maximum reliability. In general, rules covering fewer exemplars have a high chance of being very reliable, while offering only very narrow generalizations. Because of this, MGL adjusts the reliability of a rule for its scope, with rules covering few exemplars getting a large downward adjustment. In practice, this leads to general rules having higher reliability than specific rules.

When a novel form is presented to the MGL model, it will usually be covered by different rules suggesting the same or different changes. From all the rules covering the form, MGL will select the rule with the highest reliability and output its associated inflectional change. MGL can also give the probability of the most reliable matching output for each inflectional change.

Albright and Hayes (2003) have argued that there is a fundamental difference between MGL and models that are based on analogy, such as MBL and AM, because, in MGL, a rule context is a structural description of which forms may match. They claim that this structured similarity uniquely allows MGL to identify islands of reliability (IORs): contexts in which there is an unusually high support for a particular inflectional pattern. For instance, in the context /s__ŋ/, which matches the group of irregular verbs like sing–sung, the structural change /ɪ/–/ʌ/ is exceptionally reliable. Analogy-based models, which do not use such structural descriptions, would be unable to identify these islands. However, data comparing MGL to MBL and AM do not strongly support the idea that structured similarity is essential for data-driven models of inflection (Chandler, 2010; Keuleers, 2008).

Although MGL is cast as a rule-based model by its authors, it shares many features with the exemplar-based approaches discussed above. In particular, the minimal generalization procedure shares with AM the emphasis on avoiding direct comparison between exemplars by looking for broader contexts. This distinguishes MGL and AM from MBLP: The first two models structure the lexicon by contexts, while the latter model assumes that no such structuring is necessary. In addition, while MGL can be implemented as a rule-based system, it can also be implemented as an exemplar-based approach, where comparisons are done at runtime instead of relying on prederived rules (Keuleers, 2008).

2.5 Dual Mechanism Models

A substantial amount of computational modeling in morphology has been driven by the claim that the data on how language users choose to inflect existing and novel forms cannot be explained by models that have only an exemplar-based component. The dual mechanism view of morphology, based mostly on observation of English and German inflection, holds that inflection is characterized by a symbolic rule component that applies in noncanonical cases, such as borrowings and novel forms, and an exemplar-based component that handles only the cases that are not covered by the default rule (e.g., Marcus et al., 1995; Prasada & Pinker, 1993).

Interestingly, the evidence brought by proponents of the dual mechanism approach does not rely so much on computational models implementing the approach, but on observing the shortcomings of single mechanism models (e.g., Pinker & Prince, 1988). Consequently, a large body of work has shown that the phenomena for which a dual mechanism is theoretically posited can actually be explained by single mechanism models and that computational implementations of dual mechanism models usually do not offer a better account (e.g., Albright & Hayes, 2003; Hahn & Nakisa, 2000; Keuleers et al., 2007). The debate about single vs. dual mechanism models of morphology can be seen as reflecting a general tension between purely theoretical models and computational implementations, where theoretical analysis tends to lead to the postulation of additional mechanisms without exploring how simpler computational implementations can explain results in a way that is unforeseen by the theoretical analysis. Veríssimo and Clahsen’s (2014) study of Portuguese verbal inflection is the only study so far in which proponents of the dual mechanism view have offered a computational implementation of their model.

3. Modeling Morphology Using Distributional Semantics

Distributional semantics is a generic term for different methods that derive semantic representations from word co-occurrence relations in corpora. The product of a distributional semantic analysis is a vector space—typically with hundreds of dimensions—in which words are represented as numerical vectors. In the most basic implementation of such a vector space, each number in a word vector will indicate whether that word occurs in a particular context, such as a document. In more complex implementations, the numbers in a vector space can, for instance, represent how well the word can be predicted by other words. In any case, the similarity between the vectors, and therefore words, can be approximated by using any of a number of mathematical distance measures.

Recent developments in methods to generate these vector spaces have spurred various computational investigations about the role of semantics in morphology. Marelli, Amenta, and Crepaldi (2015) proposed an orthographic-semantic consistency (OSC) measure that quantifies how well a word’s orthography predicts its meaning. When all words in which a particular form occurs have similarity in meaning—as is the case with transparent word relations such as baker and bakery—then the vectors for these words should be close in semantic space and OSC will be high. In the case that there is similarity in form but not in meaning—as is the case with opaque word relations such as crypt and cryptic—the vectors should be distant in semantic space and OSC should be low. Marelli et al. showed that OSC is a significant predictor of word recognition times, and accounts, at least partially, for the well-known but little-studied effect that transparent words are processed faster than opaque words.

Recently, Mikolov, Yih, and Zweig (2013) showed that vector spaces built using recurrent neural networks can be used to extrapolate relationships between morphologically related forms. As an example, let us assume that, based on the relationship between the singular year and the plural years, we want to infer the plural form for law. In a vector space, the words year, years, law, and laws—like all other wordsare represented as equal length vectors of real numbers. The method proposed by Mikolov et al. consists of first subtracting the vectors for year from the vector for years. Then, the result of that subtraction is added to the vector for law. The vector resulting from this addition is the predicted vector for the plural of law. Since it would be highly improbable that a vector exists at exactly these coordinates, the final step is to find the closest vector in the multidimensional space. If the method is successful, this should be the vector for laws. Mikolov et al. report simulations on verbal, nominal, and adjectival inflection with varying degrees of success. They also demonstrate that the results obtained using vectors spaces based on recurrent neural networks produce much better results than vector spaces based on latent semantic analysis.

Marelli and Baroni (2015) have shown that vector space models can also be used to compute the meaning of derived forms. The contrast with other computational models of morphology including a semantic component is that the meanings of the words in Marelli and Baroni’s model are completely data-driven. Marelli and Baroni show that affixes can be represented as matrices that represent an optimal mapping from unaffixed words to their affixed versions. For instance, a matrix for the affix re- would be constructed by computing the matrix that best maps vectors for forms such as consider and apply to their affixed forms reconsider and reapply. When this matrix is multiplied with the vector for a stem, the result is a vector representing the composed meaning. For instance, multiplying the matrix for re with the vector for finalize would result in a semantic vector for the novel word refinalize. The model correctly predicts semantic intuitions about these novel forms, showing that vectors that are the most similar to the constructed vectors in semantic space are also rated as more similar by humans than the vectors for the stems themselves. For instance, the vector most similar to the constructed form insultable is the existing form reprehensible. When asked whether the form insultable is more similar to the stem insult or to reprehensible, participants choose the latter answer, showing that a plausible derivational meaning vector was computed. In addition to these findings, the model can replicate semantic transparency effects from the experimental literature.

4. Computational Approaches to Morphology in Reading and Recognizing Words

While several computational models of human reading ability have been proposed, the investigation of how morphologically complex words are read within these models has been very limited. One reason is that the most prominent models, which will be discussed below, have focused primarily on modeling the reading of monomorphemic and monosyllabic words.

The Dual Route Cascaded (DRC) model of reading (Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001) and the Connectionist Dual Process (CDP++) models (Perry, Ziegler, & Zorzi, 2007) have both a lexical route, where a word’s spelling is tied to its full pronunciation, and a sublexical route that is able to read out words. The models share the same lexical route, but the former model uses grapheme-to-phoneme conversion rules (GPC), while the latter uses a two-layer connectionist network to map between orthography and phonology. These models treat stems and affixes completely differently:

While stems can be handled by the lexical route, frequent affixes (e.g., pre-, co-, -ity, -ness) are always handled by the sublexical route if they do not occur in known words. The triangle model (Harm & Seidenberg, 1999, 2004; Seidenberg & McClelland, 1989) is a connectionist model that contains bidirectional mappings between three components: orthography, phonology, and semantics. In this model, identifying the meaning of a written word can occur directly via the orthography to semantics mapping or indirectly via the phonology mapping. Reading a written word out loud can happen directly via the orthography to phonology mapping or can be mediated via the semantic mapping. Early incarnations of the triangle model also discarded the problem of reading morphologically complex words. Harm and Seidenberg (2004) implemented the semantic component of the model to include feature labels for stems as well as for morphological features, such as affixes representing number or gender. They demonstrated that when such a model is presented with novel forms that contain a potential affix, the corresponding morphological labels in the semantic component are strongly activated, suggesting that representing the semantics of stems as well as of affixes is important when addressing the role of morphology in reading.

Another branch of work using connectionist models has focused on artificial languages and has established that letter sequences corresponding to morphological affixes reliably predict a particular pronunciation, in contrast to sublexical patterns that do not correspond to morphemes (Plaut & Gonnerman, 2000; Rueckl & Raveh, 1999).

Sibley, Kello, Plaut, and Elman (2009) have noted that a problem with all the implementations above is that they rely on slot-based codes, in which each letter is assigned a particular position-specific slot (e.g., CAT = [C1, A2, T3]). These models are therefore limited to dealing with short words which can easily be aligned using a slot-based representation. Because slot-based representation schemes force characters from words to be assigned a certain position, they ignore the similarities between letters at different positions. Models using slot-based coding schemes have inherent problems with extracting common patterns in multimorphemic words. Trying to fit the words represented, representation, presented in a slot-based coding scheme, left-alignment would make clear that representation and represented share morphological structure but would fail to uncover the similarity between presented and represented. Right-alignment would acknowledge the similarity between presented and represented but would ignore any similarity between representation and represented.

Several alternatives to slot-based representations have been proposed (see Davis, 2006 for an overview). Most prominent in the psycholinguistic literature are open-bigram coding (Grainger & Whitney, 2004), which represents a word by all its combinations of two characters (e.g. CATS = [CA, CT, CS, AT, AS, TS]), and spatial coding (Davis, 2010), which gives the first letter the highest activation, the second letter the second-highest activation, and so on. Sibley, Kello, Plaut, and Elman (2008) developed an alternative based on a simple recurrent network (Elman, 1990). Their model learns to encode words of variable length as sequences of fixed length and to output those fixed length sequences again as variable length sequences. Unfortunately, there have been limited attempts to apply these coding schemes specifically to reading of morphologically complex words.

A recent approach that focuses specifically on morphology in word recognition is naive discriminative learning (Baayen, Milin, Filipović-Đurđević, Hendrix, & Marelli, 2011). This model has many similarities to connectionist models, but it is based on the principles of discriminative learning (Rescorla & Wagner, 1972). Like the triangle model in Harm and Seidenberg (2004), it represents the meaning of stems as well as affixes. However, stems and affixes are both linked to the representation of the whole word, not to a specific part of it. Since it does not include phonology, it is not a model of reading aloud. It focuses on simulating word identification response times or fixation durations during reading. The model has predicted paradigmatic effects in sentential reading in Serbian, a morphologically highly complex language, and many other experimental findings from English.

5. Critical Analysis and Future Directions

One of the reasons why computational approaches to word formation have flourished is that morphology seems to present a clear testing ground for the view that symbolic processing is a requirement for language (Fodor, 1975; Fodor & Pylyshin, 1988; Newell & Simon, 1976; Pinker & Prince, 1988). Some morphological domains, such as the English past tense formation and German plural formation, appear to present evidence characteristic of symbolic processing, with one regular inflectional suffix applying across the board, regardless of sound (Marcus et al., 1995; Pinker, 1998; Prasada & Pinker, 1993). At the same time, these domains also contain irregular inflectional processes that seem to apply in limited cases and are characteristic of nonsymbolic sound-driven processes. Starting with Rumelhart and McClelland (1986), one of the main motivations for developing computational models was to show that the evidence that seemed to be characteristic of symbolic processing could be explained in a model that only assumed sound-driven learning of morphology. While Pinker and Prince’s critique of the early connectionist models (1988) remains valuable, later developments (e.g., Hahn & Nakisa, 2000; Keuleers et al., 2008) repeatedly demonstrate that what seem to be hallmarks of symbolic processing are actually completely in line with a nonsymbolic data-driven view of morphology. At the same time, there are many differences within these computational approaches, with an important difference in whether they appeal to intermediate levels of organization (Albright & Hayes, 2003; Skousen, 1989) or direct form-to-form comparison (Daelemans & van den Bosch, 2005; Keuleers & Daelemans, 2007; Keuleers et al., 2008). The question whether these intermediate contexts are required to explain word formation is still extremely relevant and is bound to carry over to theorizing in other areas of psycholinguistics such as sentence formation. At the same time, it is important to understand that while these models are data-driven, they still make a lot of assumptions that are symbolic at the subform level: Form representations rely on phonetic symbols, inflectional endings are represented as explicit classes, and so forth. Whether these representations have any psychological reality is not an intermediate problem that can be discarded. Future research will have to address the formation of representations from sound input. If this development takes place, it will lead to more comprehensive theories of word formation that may be very far removed from current accounts.

In an attempt to simplify the problem of reading, computational psycholinguistic models were initially developed to deal with very short and simple words. When it comes to morphology, many leading models of reading are still hindered by this unfortunate legacy. Even today, the reading of long and morphologically complex forms is not seen as a core problem by models such as the DRC (Coltheart et al., 2001) and CDP+ (Perry, Ziegler, & Zorzi, 2007) which, by design, consider reading as independent of the acquisition and development of relations among words. However, it is becoming more and more clear that a model of reading and word identification cannot be complete without also being a model of word acquisition and that models that do not incorporate acquisition are stretched beyond their limits when put to the task of reading and recognition of morphologically complex words. The triangle model (Harm & Seidenberg, 2004) makes a step in that direction by including a semantic component in the model, but only in the naive discriminative model (Baayen et al., 2011) is the implicit learning of form-relations a central feature of the model.

Future research in computational approaches to morphology may benefit most from explicitly addressing learning. In this respect, it is telling that the naive discriminative learning model (Baayen et al., 2011) and the distributional semantics models developed for meaning identification (Marelli & Baroni, 2015) have a lot in common. The naive discriminative learning model is based on the discrimination learning model (Rescorla & Wagner, 1972), which has close mathematical correspondences to the learning rules used in state-of-the art distributional semantics models based on recurrent neural networks (Mikolov, Yih, & Zweig, 2013). There are also some obvious differences: The focus in neural network implementations of distributional semantics models are the intermediate representations formed in hidden layers; naive discriminative learning, on the other hand, does not make use of hidden layers at all. Still, these models offer a valuable insight, by showing that problems that are posed in explicitly linguistic morphological terminology can be addressed using very simple computational learning principles, without requiring any explicit morphological information.

Further Reading

Albright, A., & Hayes, B. (2003). Rules vs. analogy in English past tenses: A computational/experimental study. Cognition, 90(2), 119–161. doi:10.1016/S0010-0277(03)00146-XFind this resource:

Chandler, S. (2010). The English past tense: Analogy redux. Cognitive Linguistics, 21(3). doi:10.1515/COGL.2010.014Find this resource:

Daelemans, W., & Van den Bosch, A. (2005). Memory-based language processing. Cambridge: Cambridge University Press.Find this resource:

Marelli, M., Amenta, S., & Crepaldi, D. (2015). Semantic transparency in free stems: The effect of Orthography-Semantics Consistency on word recognition. The Quarterly Journal of Experimental Psychology, 68(8), 1571–1583. doi:10.1080/17470218.2014.959709Find this resource:

Pinker, S. (1998). Words and rules. Lingua, 106(1), 219–242.Find this resource:

Pinker, S., & Prince, A. (1988). On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28(1–2), 73–193.Find this resource:

Rueckl, J. G. (2011). Connectionism and the role of morphology in visual word recognition. The Mental Lexicon, 5(3), 371–400. doi:10.1075/ml.5.3.07rueFind this resource:

Skousen, R. (1989). Analogical modeling of language. Dordrecht, The Netherlands: Kluwer.Find this resource:


Albright, A., & Hayes, B. (2003). Rules vs. analogy in English past tenses: A computational/experimental study. Cognition, 90(2), 119–161. doi:10.1016/S0010-0277(03)00146-XFind this resource:

Baayen, R. H., Milin, P., Đurđević, D. F., Hendrix, P., & Marelli, M. (2011). An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review, 118(3), 438–481. doi:10.1037/a0023851Find this resource:

Chandler, S. (2010). The English past tense: Analogy redux. Cognitive Linguistics, 21(3). doi:10.1515/COGL.2010.014Find this resource:

Coltheart, M., Rastle, K., Perry, C., Langdon, R., & Ziegler, J. (2001). DRC: A dual route cascaded model of visual word recognition and reading aloud. Psychological Review, 108(1), 204.Find this resource:

Cottrell, G. W., & Plunkett, K. (1994). Acquiring the mapping from meaning to sounds. Connection Science, 6(4), 379–412.Find this resource:

Daelemans, W., Berck, P., & Gillis, S. (1997). Data mining as a method for linguistic analysis: Dutch diminutives. Folia Linguistica, 31(1–2), 57–76.Find this resource:

Daelemans, W., Gillis, S., & Durieux, G. (1997). Skousen’s analogical modeling algorithm: A comparison with lazy learning. In New methods in language processing (pp. 3–15). London: University College Press.Find this resource:

Daelemans, W., & Van den Bosch, A. (2005). Memory-based language processing. Cambridge: Cambridge University Press.Find this resource:

Daelemans, W., Zavrel, J., van der Sloot, K., & Van den Bosch, A. (2004). Timbl: Tilburg memory-based learner. The Netherlands: Tilburg University. Retrieved from this resource:

Davis, C. J. (2010). The spatial coding model of visual word identification. Psychological Review, 117(3), 713.Find this resource:

Davis, C. J. (2006). Orthographic input coding: A review of behavioural evidence and current models. In S. Andrews (Ed.), From inkmarks to ideas: Current issues in lexical processing (pp. 180–206). Hove: Psychology Press.Find this resource:

Eddington, D. (2002a). A comparison of two models: Tilburg memory-based learner versus analogical modeling of language. In R. Skousen, D. Lonsdale, & D. B. Parkinson (Eds.), Analogical modeling: An exemplar-based approach to language (pp. 141–155). Amsterdam: John Benjamins.Find this resource:

Eddington, D. (2002b). Spanish gender assignment in an analogical framework. Journal of Quantitative Linguistics, 9(1), 49–75.Find this resource:

Eddington, D. (2002c). Spanish diminutive formation without rules or constraints. Linguistics, 40(2), 395–420.Find this resource:

Eddington, D. (2004). Issues in modeling language processing analogically. Lingua, 114(7), 849–871.Find this resource:

Eddington, D. (2009). Spanish verbal inflection: A single-or dual-route system? Linguistics, 47(1), 173–199.Find this resource:

Eddington, D., & Lonsdale, D. (2007). Analogical modeling: an update (Unpublished manuscript). Brigham Young University, Provo, UT.Find this resource:

Elman, J. L. (1990). Finding structure in time. Cognitive Science, 14(2), 179–211.Find this resource:

Fix, E., & Hodges, J. L. (1951). Discriminatory analysis. Nonparametric discrimination: consistency properties (Technical report). Randolph Field, Texas: USAF School of Aviation Medicine.Find this resource:

Fodor, J. A. (1975). The language of thought. Harvard: Harvard University Press.Find this resource:

Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28(1–2), 3–71. doi:10.1016/0010-0277(88)90031-5Find this resource:

Grainger, J., & Whitney, C. (2004). Does the huamn mnid raed wrods as a wlohe? Trends in Cognitive Sciences, 8(2), 58–59.Find this resource:

Hahn, U., & Nakisa, R. C. (2000). German inflection: Single route or dual route? Cognitive Psychology, 41(4), 313–360.Find this resource:

Harm, M. W., & Seidenberg, M. S. (1999). Phonology, reading acquisition, and dyslexia: Insights from connectionist models. Psychological Review, 106(3), 491–528. doi:10.1037/0033-295X.106.3.491Find this resource:

Harm, M. W., & Seidenberg, M. S. (2004). Computing the meanings of words in reading: Cooperative division of labor between visual and phonological processes. Psychological Review, 111(3), 662–720. doi:10.1037/0033-295X.111.3.662Find this resource:

Joanisse, M. F., & Seidenberg, M. S. (1999). Impairments in verb morphology after brain injury: A connectionist model. Proceedings of the National Academy of Sciences, 96(13), 7592–7597.Find this resource:

Keuleers, E. (2008). Memory-based learning of inflectional morphology (Unpublished doctoral dissertation). University of Antwerp. Retrieved from this resource:

Keuleers, E., & Daelemans, W. (2007). Memory-based learning models of inflectional morphology: A methodological case-study. Lingue e linguaggio, 6(2), 151–174. doi:10.1418/25649Find this resource:

Keuleers, E., Sandra, D., Daelemans, W., Gillis, S., Durieux, G., & Martens, E. (2007). Dutch plural inflection: The exception that proves the analogy. Cognitive Psychology, 54(4), 283–318. doi:10.1016/j.cogpsych.2006.07.002Find this resource:

Krott, A., Baayen, R. H., & Schreuder, R. (2001). Analogy in morphology: Modeling the choice of linking morphemes in Dutch. Linguistics, 39(1), 51–94.Find this resource:

Marcus, G. F., Brinkmann, U., Clahsen, H., Wiese, R., & Pinker, S. (1995). German inflection: The exception that proves the rule. Cognitive Psychology, 29(3), 189–256. doi:10.1006/cogp.1995.1015Find this resource:

Marelli, M., Amenta, S., & Crepaldi, D. (2015). Semantic transparency in free stems: The effect of Orthography-Semantics Consistency on word recognition. The Quarterly Journal of Experimental Psychology, 68(8), 1571–1583. doi:10.1080/17470218.2014.959709Find this resource:

Marelli, M., & Baroni, M. (2015). Affixation in semantic space: Modeling morpheme meanings with compositional distributional semantics. Psychological Review, 122(3), 485–515.Find this resource:

Milin, P., Keuleers, E., & Filipović-Đurđević, D. (2011). Allomorphic responses in Serbian pseudo-nouns as a result of analogical learning. Acta Linguistica Hungarica, 58(1), 65–84. doi:10.1556/ALing.58.2011.1-2.4Find this resource:

Mikolov, T., Yih, W., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Proceedings of HLT-NAACL (pp. 746–751). Retrieved from this resource:

Nakisa, R. C., & Hahn, U. (1996). Where defaults don’t help: The case of the German plural system. Proceedings of the 18th Annual Conference of the Cognitive Science Society (pp. 177–182).Find this resource:

Newell, A., & Simon, H. A. (1976). Computer science as empirical inquiry: Symbols and search. Communications of the ACM, 19(3), 113–126.Find this resource:

Paice, C. D. (1994). An evaluation method for stemming algorithms. In Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 42–50). New York: Springer-Verlag. Retrieved from this resource:

Perry, C., Ziegler, J. C., & Zorzi, M. (2007). Nested incremental modeling in the development of computational theories: The CDP+ model of reading aloud. Psychological Review, 114(2), 273.Find this resource:

Pinker, S. (1998). Words and rules. Lingua, 106(1), 219–242.Find this resource:

Pinker, S., & Prince, A. (1988). On language and connectionism: Analysis of a parallel distributed processing model of language acquisition. Cognition, 28(1–2), 73–193.Find this resource:

Plaut, D. C., & Gonnerman, L. M. (2000). Are non-semantic morphological effects incompatible with a distributed connectionist approach to lexical processing? Language and Cognitive Processes, 15(4–5), 445–485.Find this resource:

Plunkett, K., & Marchman, V. (1991). U-shaped learning and frequency effects in a multi-layered perception: Implications for child language acquisition. Cognition, 38(1), 43–102.Find this resource:

Plunkett, K., & Marchman, V. (1993). From rote learning to system building: Acquiring verb morphology in children and connectionist nets. Cognition, 48(1), 21–69.Find this resource:

Porter, M. F. (2006). An algorithm for suffix stripping. Program: Electronic Library and Information Systems, 40(3), 211–218. doi:10.1108/00330330610681286Find this resource:

Prasada, S., & Pinker, S. (1993). Generalisation of regular and irregular morphological patterns. Language and Cognitive Processes, 8(1), 1–56.Find this resource:

Rescorla, R. A., & Wagner, A. R. (1972). A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In A. H. Black & W. F. Prokasy (Eds.), Classical conditioning: current research and theory. New York: Appleton-Century-Crofts.Find this resource:

Rueckl, J. G., & Raveh, M. (1999). The influence of morphological regularities on the dynamics of a connectionist network. Brain and Language, 68(1), 110–117.Find this resource:

Rumelhart, D. E., & McClelland, J. L. (1986). On learning the past tenses of English verbs. In J. L. McClelland & D. E. Rumelhart (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Volume 2: Psycholological and Biological Models (pp. 216–271). Cambridge, MA: MIT Press.Find this resource:

Seidenberg, M. S., & McClelland, J. L. (1989). A distributed, developmental model of word recognition and naming. Psychological Review, 96(4), 523.Find this resource:

Sibley, D. E., Kello, C. T., Plaut, D. C., & Elman, J. L. (2008). Large-scale modeling of wordform learning and representation. Cognitive Science: A Multidisciplinary Journal, 32(4), 741–754. doi:10.1080/03640210802066964Find this resource:

Sibley, D. E., Kello, C. T., Plaut, D. C., & Elman, J. L. (2009). Sequence encoders enable large-scale lexical modeling: Reply to Bowers and Davis (2009). Cognitive Science, 33(7), 1187–1191. doi:10.1111/j.1551-6709.2009.01064.xFind this resource:

Skousen, R. (1989). Analogical modeling of language. Dordrecht, The Netherlands: Kluwer.Find this resource:

Su, K.-Y., Wu, M.-W., & Chang, J.-S. (1994). A corpus-based approach to automatic compound extraction. In Proceedings of the 32nd Annual Meeting on Association for Computational Linguistics (pp. 242–247). Stroudsburg, PA: Association for Computational Linguistics. doi:10.3115/981732.981765Find this resource:

Veríssimo, J., & Clahsen, H. (2014). Variables and similarity in linguistic generalization: Evidence from inflectional classes in Portuguese. Journal of Memory and Language, 76, 61–79.Find this resource:


(1.) MBLP is often used interchangeably with MBL (Memory-Based Learning). Early sources often use MBL, while since Daelemans and van den Bosch (2005) MBLP seems to be the preferred name.

(2.) The approach and its implementation were originally referred to as AML (Analogical Modeling of Language). In recent years, the approach has become known as AM.