This is an advance summary of a forthcoming article in the Oxford Research Encyclopedia of Linguistics. Please check back later for the full article.
The word accent system of Tokyo Japanese might look quite complex. with its number of accent patterns and rules. However, recent research has shown that it is not as complex as has been assumed if one incorporates the notion of markedness into the analysis: nouns have only two productive accent patterns: (a) the antepenultimate and unaccented patterns, as opposed to (b) multiple patterns. Seemingly different accent rules can be generalized if one focuses on these productive accent patterns.
The word accent system raises some new interesting issues. One of them concerns the fact that a majority of nouns are “unaccented,” that is, they are pronounced with a rather flat pitch pattern, apparently violating the principle of obligatoriness. A careful analysis of noun accentuation reveals that this strange accent pattern occurs in some linguistically predictable structures. In morphologically simplex nouns, it typically tends to emerge in four-mora nouns ending in a sequence of light syllables. In compound nouns, on the other hand, it emerges due to multiple factors, such as compound-final de-accenting or pseudo-de-accenting morphemes, certain syntactic categories, as well as some prosodic configurations.
Japanese pitch accent exhibits an interesting aspect in its interactions with other phonological and linguistic structures. For example, the accent of compound nouns is closely related with rendaku, or sequential voicing, which is also characteristic of compounds: the choice between the accented and unaccented patterns in certain types of compound nouns correlates with the presence or absence of the sequential voicing. Moreover, whether the compound accent rule applies to a certain compound depends on its internal morphosyntactic configuration as well as its meaning; alternatively, the compound accent rule is blocked in certain types of morphosyntactic and semantic structures.
Japanese pitch accent also displays interesting features in the domains beyond the word. For example, it actively participates in downstep, an intonational process by which word accent lowers the pitch range of subsequent materials in the same utterance. It also exhibits intriguing properties with respect to the ways accent neutralization is avoided in sentence-level phonology.
Finally, careful analysis of word accent sheds new light on the syllable structure of the language, notably on two interrelated questions about diphthonghood and super-heavy syllables. It provides crucial insight into diphthongs, or the question of which vowel sequence constitutes a diphthong as against vowel sequences across syllable boundaries. It also presents new evidence against trimoraic syllables in the language.
Marie K. Huffman
Articulatory phonetics is concerned with the physical mechanisms involved in producing spoken language. A fundamental goal of articulatory phonetics is to relate linguistic representations to articulator movements in real time and the consequent acoustic output that makes speech a medium for information transfer. Understanding the overall process requires an appreciation of the aerodynamic conditions necessary for sound production and the way that the various parts of the chest, neck, and head are used to produce speech. One descriptive goal of articulatory phonetics is the efficient and consistent description of the key articulatory properties that distinguish sounds used contrastively in language. There is fairly strong consensus in the field about the inventory of terms needed to achieve this goal. Despite this common, segmental, perspective, speech production is essentially dynamic in nature. Much remains to be learned about how the articulators are coordinated for production of individual sounds and how they are coordinated to produce sounds in sequence. Cutting across all of these issues is the broader question of which aspects of speech production are due to properties of the physical mechanism and which are the result of the nature of linguistic representations. A diversity of approaches is used to try to tease apart the physical and the linguistic contributions to the articulatory fabric of speech sounds in the world’s languages. A variety of instrumental techniques are currently available, and improvement in safe methods of tracking articulators in real time promises to soon bring major advances in our understanding of how speech is produced.
This is an advance summary of a forthcoming article in the Oxford Research Encyclopedia of Linguistics. Please check back later for the full article.
Coarticulation can be characterized as an articulatory effect exerted by one phonetic segment (the trigger) onto another (the target) in the speech chain, for example, anticipatory velar lowering during a vowel preceding a syllable-final nasal consonant (send) or tongue body raising and fronting during a schwa placed next to a palatal consonant (the shore, a shamed). Coarticulatory effects have been generally investigated with reference to a single articulator (e.g., velum, lips, tongue tip, tongue body, jaw, larynx) or a given acoustic parameter (e.g., second formant). It is then convenient to keep this concept separate from gestural coproduction, which refers to the spatiotemporal interaction among different articulatory structures during the realization of one or several successive phonetic segments.
Coarticulation may be measured in space and time. Thus, tongue body raising and fronting effects exerted by palatal consonants on an immediately preceding schwa are predicted to be larger and start earlier than those exerted by the same consonant type on a preceding low or mid-vowel. Moreover, the spatiotemporal effects in question may differ in direction—they may be anticipatory and thus proceed leftwards towards the preceding segment(s), or they may be carryover and thus proceed rightwards towards the following segment(s); it is commonly accepted that anticipatory effects reflect phonemic planning, while carryover effects are mainly associated with the physico-mechanical requirements of the articulatory structures. The magnitude, temporal extent, and direction of the co-articulatory effects are conditioned by the place and manner of articulation of the triggering and target consonants and/or vowels, as well as by the articulatory subsystem involved in closure or constriction formation. Depending on their articulatory characteristics, vowels and consonants may differ regarding coarticulation resistance and aggressiveness, namely, the degree to which they block coarticulatory effects from contextual segments (resistance) and modify the articulatory characteristics of other segments (aggressiveness); thus, in a CV sequence composed of a palatal consonant and a schwa, the palatal segment is more coarticulation resistant and aggressive than the schwa. Other factors affecting coarticulation are segmental position within the word and the utterance and, with respect to word and sentence stress, as well as sequence type (VCV, CC, and so on), speech rate, speaker, and language.
The study of coarticulation provides information about the spatiotemporal mechanisms used by speakers for the production of phonemic sequences, about phonemic planning strategies in speech, and about sound change patterns and assimilatory processes. It has been traditionally assumed that coarticulatory effects are phonetic and thus gradual, variable, and universal, while assimilations are phonological and thus categorical, systematic, and language-specific. Thus, for example, tongue body raising and fronting effects from a palatal consonant during a schwa occur to a greater or lesser extent in any speech production event (coarticulation), but may only be labeled assimilatory if giving rise to a higher and more frontal vowel, such as /e/ or /i/, in a subset of lexical items or across the lexicon of a given language (assimilation). Experimental evidence shows, however, that the division between coarticulation and assimilation is not so straightforward. Indeed, coarticulatory effects may exhibit language-dependent differences (e.g., languages may differ regarding the degree of anticipatory vowel nasalization triggered by a syllable-final nasal consonant), while processes that have been traditionally considered to be assimilatory are far from applying categorically and systematically (e.g., the extent to which /n/ assimilates in place of articulation to a following consonant in English or German may vary with the consonant itself, speaker, prosodic factors, and speech rate).
Holger Diessel and Martin Hilpert
Until recently, theoretical linguists have paid little attention to the frequency of linguistic elements in grammar and grammatical development. It is a standard assumption of (most) grammatical theories that the study of grammar (or competence) must be separated from the study of language use (or performance). However, this view of language has been called into question by various strands of research that have emphasized the importance of frequency for the analysis of linguistic structure. In this research, linguistic structure is often characterized as an emergent phenomenon shaped by general cognitive processes such as analogy, categorization, and automatization, which are crucially influenced by frequency of occurrence.
There are many different ways in which frequency affects the processing and development of linguistic structure. Historical linguists have shown that frequent strings of linguistic elements are prone to undergo phonetic reduction and coalescence, and that frequent expressions and constructions are more resistant to structure mapping and analogical leveling than infrequent ones. Cognitive linguists have argued that the organization of constituent structure and embedding is based on the language users’ experience with linguistic sequences, and that the productivity of grammatical schemas or rules is determined by the combined effect of frequency and similarity. Child language researchers have demonstrated that frequency of occurrence plays an important role in the segmentation of the speech stream and the acquisition of syntactic categories, and that the statistical properties of the ambient language are much more regular than commonly assumed. And finally, psycholinguists have shown that structural ambiguities in sentence processing can often be resolved by lexical and structural frequencies, and that speakers’ choices between alternative constructions in language production are related to their experience with particular linguistic forms and meanings. Taken together, this research suggests that our knowledge of grammar is grounded in experience.
Young-mee Yu Cho
Due to a number of unusual and interesting properties, Korean phonetics and phonology have been generating productive discussion within modern linguistic theories, starting from structuralism, moving to classical generative grammar, and more recently to post-generative frameworks of Autosegmental Theory, Government Phonology, Optimality Theory, and others. In addition, it has been discovered that a description of important issues of phonology cannot be properly made without referring to the interface between phonetics and phonology on the one hand, and phonology and morpho-syntax on the other. Some phonological issues from Standard Korean are still under debate and will likely be of value in helping to elucidate universal phonological properties with regard to phonation contrast, vowel and consonant inventories, consonantal markedness, and the motivation for prosodic organization in the lexicon.
Kodi Weatherholtz and T. Florian Jaeger
The seeming ease with which we usually understand each other belies the complexity of the processes that underlie speech perception. One of the biggest computational challenges is that different talkers realize the same speech categories (e.g., /p/) in physically different ways. We review the mixture of processes that enable robust speech understanding across talkers despite this lack of invariance. These processes range from automatic pre-speech adjustments of the distribution of energy over acoustic frequencies (normalization) to implicit statistical learning of talker-specific properties (adaptation, perceptual recalibration) to the generalization of these patterns across groups of talkers (e.g., gender differences).
Patrice Speeter Beddor
In their conversational interactions with speakers, listeners aim to understand what a speaker is saying, that is, they aim to arrive at the linguistic message, which is interwoven with social and other information, being conveyed by the input speech signal. Across the more than 60 years of speech perception research, a foundational issue has been to account for listeners’ ability to achieve stable linguistic percepts corresponding to the speaker’s intended message despite highly variable acoustic signals. Research has especially focused on acoustic variants attributable to the phonetic context in which a given phonological form occurs and on variants attributable to the particular speaker who produced the signal. These context- and speaker-dependent variants reveal the complex—albeit informationally rich—patterns that bombard listeners in their everyday interactions.
How do listeners deal with these variable acoustic patterns? Empirical studies that address this question provide clear evidence that perception is a malleable, dynamic, and active process. Findings show that listeners perceptually factor out, or compensate for, the variation due to context yet also use that same variation in deciding what a speaker has said. Similarly, listeners adjust, or normalize, for the variation introduced by speakers who differ in their anatomical and socio-indexical characteristics, yet listeners also use that socially structured variation to facilitate their linguistic judgments. Investigations of the time course of perception show that these perceptual accommodations occur rapidly, as the acoustic signal unfolds in real time. Thus, listeners closely attend to the phonetic details made available by different contexts and different speakers. The structured, lawful nature of this variation informs perception.
Speech perception changes over time not only in listeners’ moment-by-moment processing, but also across the life span of individuals as they acquire their native language(s), non-native languages, and new dialects and as they encounter other novel speech experiences. These listener-specific experiences contribute to individual differences in perceptual processing. However, even listeners from linguistically homogenous backgrounds differ in their attention to the various acoustic properties that simultaneously convey linguistically and socially meaningful information. The nature and source of listener-specific perceptual strategies serve as an important window on perceptual processing and on how that processing might contribute to sound change.
Theories of speech perception aim to explain how listeners interpret the input acoustic signal as linguistic forms. A theoretical account should specify the principles that underlie accurate, stable, flexible, and dynamic perception as achieved by different listeners in different contexts. Current theories differ in their conception of the nature of the information that listeners recover from the acoustic signal, with one fundamental distinction being whether the recovered information is gestural or auditory. Current approaches also differ in their conception of the nature of phonological representations in relation to speech perception, although there is increasing consensus that these representations are more detailed than the abstract, invariant representations of traditional formal phonology. Ongoing work in this area investigates how both abstract information and detailed acoustic information are stored and retrieved, and how best to integrate these types of information in a single theoretical model.
Harry van der Hulst
The subject of this article is vowel harmony. In its prototypical form, this phenomenon involves agreement between all vowels in a word for some phonological property (such as palatality, labiality, height or tongue root position). This agreement is then evidenced by agreement patterns within morphemes and by alternations in vowels when morphemes are combined into complex words, thus creating allomorphic alternations. Agreement involves one or more harmonic features for which vowels form harmonic pairs, such that each vowel has a harmonic counterpart in the other set. I will focus on vowels that fail to alternate, that are thus neutral (either inherently or in a specific context), and that will be either opaque or transparent to the process. We will compare approaches that use underspecification of binary features and approaches that use unary features. For vowel harmony, vowels are either triggers or targets, and for each, specific conditions may apply. Vowel harmony can be bidirectional or unidirectional and can display either a root control pattern or a dominant/recessive pattern.