Summary and Keywords
Articulatory phonetics is concerned with the physical mechanisms involved in producing spoken language. A fundamental goal of articulatory phonetics is to relate linguistic representations to articulator movements in real time and the consequent acoustic output that makes speech a medium for information transfer. Understanding the overall process requires an appreciation of the aerodynamic conditions necessary for sound production and the way that the various parts of the chest, neck, and head are used to produce speech. One descriptive goal of articulatory phonetics is the efficient and consistent description of the key articulatory properties that distinguish sounds used contrastively in language. There is fairly strong consensus in the field about the inventory of terms needed to achieve this goal. Despite this common, segmental, perspective, speech production is essentially dynamic in nature. Much remains to be learned about how the articulators are coordinated for production of individual sounds and how they are coordinated to produce sounds in sequence. Cutting across all of these issues is the broader question of which aspects of speech production are due to properties of the physical mechanism and which are the result of the nature of linguistic representations. A diversity of approaches is used to try to tease apart the physical and the linguistic contributions to the articulatory fabric of speech sounds in the world’s languages. A variety of instrumental techniques are currently available, and improvement in safe methods of tracking articulators in real time promises to soon bring major advances in our understanding of how speech is produced.
1. Scope and Goals of Articulatory Phonetics
Articulatory phonetics is concerned with the physical mechanisms involved in producing spoken language. Since its primary concern is the production of the sounds of spoken language, it is a part of linguistics that, as practiced by many, is considered part of cognitive science. As part of linguistics, it is concerned with how the physical properties of sounds vary between languages and the ways that linguistic structure contributes to the physical fabric of speech in the sounds of the world’s languages. One goal of articulatory phonetics is to provide an efficient and consistent description of the key articulatory properties that distinguish sounds used contrastively in language. A number of classification systems have been developed, differing mostly in the degree of resolution they provide in representing small phonetic differences between languages and dialects. From a linguistic perspective, key research questions in this field include which aspects of articulator movement and coordination follow from properties of the articulators, or the neural structures that execute articulatory goals, and which are under linguistic control. The expansion of the data foundation to include a wider range of speech styles, along with steady improvement in technological options for imaging articulators and tracking speech processing in the brain, promise to keep articulatory phonetics a dynamic and fruitful area of study.
2. Approaches to Studying Articulatory Phonetics
The primary source of data on speech production involves observations of the range of movements observed in humans while speaking. One approach focuses on understanding the basic properties of the articulators apart from their linguistic function. Here, research from speech pathology to dental anatomy can potentially inform our understanding of the speech mechanism. Another lens on these questions involves the basic cataloging and description of the sounds used contrastively in languages. These data minimally include characterizations of contrastive phonemes in minimal, or near-minimal, pairs. Ladefoged and Maddieson (1996) is a compilation of instrumental data on sounds of the world’s languages, representing several decades of work by researchers at UCLA and their collaborators around the world.
In addition to documenting human speech sounds across languages at a phonemic grain of analysis, the last half of the twentieth century also saw development of a strong research emphasis on documenting the range of human phonetic capabilities from another perspective, which combined attention to within-language phonetic variability and a comparison of this variation between languages. Common ways of enriching the data foundation in this area include comparing articulation of the same items at different rates of speech, or different styles of speech, and comparing the same or similar sounds in different prosodic positions, manipulating location relative to stress. For example, articulatory gestures are often faster or longer in syllable onsets or in word-initial position and slower or less extensive in final positions. However, the details of how these effects play out varies between languages. While influential early theories of phonology assumed that the phonetic details of speech production, especially change over time, were more or less “automatic” products of the speech mechanism, a half century of phonetic analysis has revealed tremendous variability both between languages in the finer aspects of articulation, and within languages in different linguistic contexts and styles of speech. This in turn opened up a whole research focus on fine phonetic detail which, since it varies between languages, must be learned, and therefore must be under linguistic control (see Research in Articulatory Phonetics; see also Beckman & Kingston, 2012).
3. Anatomy for Speech Sounds
The respiratory organs, including the lungs and the muscles that support respiration, such as the diaphragm and the intercostal muscles between the ribs, provide the aerodynamic essentials for the majority of speech sounds, which involve modifications to lung air as it passes up from the lungs and out the mouth and/or nose. The resulting flow of air is then manipulated by movements of articulators to produce sound.
The first point at which modifications are made to the airflow is the larynx, where the vocal folds can be positioned to produce sound. The most common modification is the position and stiffness of the vocal folds that favors modal voicing, which is used most commonly cross-linguistically. The vocal folds are close enough to touch but do not obstruct airflow completely, and they are appropriately tensed such that they can oscillate with the air pressure changes that result from the Bernoulli effect. The rate of vocal fold vibration determines the perceived pitch, which is used contrastively in languages with tone or pitch accent. Beyond tone and presence or absence of voicing, the mode of vocal fold vibration can also be changed, producing breathy or more creaky voice qualities. Further downstream, movement of the velum (soft palate) regulates whether air passes into the nose. A lowered velum allows air to pass into the nose but does not block air from passing into the mouth. For nasal stop consonants, due to complete obstruction in the mouth, the nose is the sole path for air to exit the vocal tract; for nasal vowels, air exits both mouth and nose.
The oral articulators can be combined in a wide variety of ways to modify the air stream. Most of this movement is accomplished through action of the tongue, lower jaw, and lower lip. Different regions of the tongue may act as the primary articulator for a consonant, and these different regions are designated with specific labels (from front to back: tip, blade, front, center, back, and root). For consonants, different degrees of constriction between articulators produce different aerodynamic effects, including stops, where airflow is temporarily blocked completely, to fricatives, in which air forced through a small constriction generates noise, to approximants, in which articulator movements produce an audible change in the sound wave but no independent acoustic event such as a burst transient or noise is created. Quick movements of the tongue tip, lips, or uvula can also be produced, resulting in taps, flaps, and trills. For vowels, the body of the tongue as a whole is moved and shaped for individual vowel targets, with accompanying targets for different degrees of lip protrusion or spreading.
For the articulators outlined above, there are additional manipulations that can be used to produce linguistic contrasts. For example, beginning with the movement of air, in addition to the universally employed egressive pulmonic airflow, some speech sounds involve trapping and manipulating pockets of air, mostly to produce variants of stops. Clicks involve a pocket of air enclosed between a velar or uvular constriction and a more forward closure, which is then rarified through downward movement of the tongue body. The inward movement of air on release makes the characteristic hollow sound of this set of consonants used in languages of southern Africa. More common are ejectives, which involve trapping air between the glottis and a more forward articulation and then compressing it through slight larynx raising. The release of this high-pressure air mass produces a loud burst with a characteristic lag before vocal fold vibration can begin. For the oral articulators, additional contrasts can also be produced—for example, by manipulating the length of time a sound is held. Languages employ length contrasts on both consonants and vowels. Sounds may also involve multiple articulators and multiple constrictions. As noted above, vowels involve not only tongue positioning but also specific lip configurations, and possibly velum lowering as well. Consonants may involve multiple oral constrictions, ranging from doubly articulated stops like the labiovelars kp and gb to secondary articulations such as palatalization, which involves a high front tongue constriction that is active while another more complete constriction is made at another point in the vocal tract.
Clearly, with the basic anatomy just described, a large number of speech sounds can be produced. Any articulatory dimensions that can be manipulated independently can be used in combination to produce sound contrasts. For example, since the velum can move independently of the tongue, we can produce nasal stops at all places of articulation forward of the velum. Furthermore, since rate of vocal fold vibration can be manipulated independently of tongue position, languages can produce multiple contrasting tones or tone contours on the same vowel quality. Certain combinations of articulations are impossible due to limits of the vocal tract. For example, pharyngeal nasal stops are not possible because a pharyngeal closure would preclude airflow through the nose. Ultimately, use in languages of the physically possible sounds is restricted either by perceptibility (e.g., languages usually don’t have as many nasal vowels as oral vowels) or by articulatory or aerodynamic boundary conditions (e.g., a voiceless approximant is hard to distinguish from a fricative).
Many excellent and detailed descriptions of speech anatomy are available in textbooks for speech language pathology courses, and much basic research in speech anatomy is done by both phoneticians and speech scientists. For more in-depth discussion of speech anatomy for phoneticians, see Reetz and Jongman (2011), Gick, Wilson, and Derrick (2013) and other references in Huffman (2011). Several very useful online resources for envisioning articulation are also available. One, a collaboration called “Seeing Speech,” includes ultrasound and MRI images of a male and a female phonetician producing a wide variety of speech sounds. The textbook companion site for Gick et al. (2013) has MRI images for English phonemes and some sounds from other languages. Finally, the University of Iowa “Sounds of Speech” resource has demonstrations with video, audio, and schematic mid-saggital views for sounds of English, Spanish, and German, also available as an app.
4. Classifying Articulatory Possibilities
How do we organize all this information in a way that allows for consistent and meaningful descriptions of speech sounds? There is fairly strong consensus in the field about the inventory of terms needed to describe and distinguish sounds that function contrastively in language. It should be noted, however, that the feature set employed in phonetics differs in several respects from common usage of distinctive features in phonology (see Hall and Mielke (2011)). Here we will briefly outline the physical properties addressed in definitional descriptions of speech sounds and discuss some less common aspects for illustrative purposes.
In phonetic descriptions, vowels and consonants employ fairly distinct sets of terms. Consonants descriptions minimally describe voicing (voiced or voiceless), place of articulation, and degree or manner of articulation.
Place of articulation terms for consonants allow for all combinations of upper and lower articulators that are used in languages, with tongue involvement being assumed and unstated, except where it must be specified because it is unusual (e.g., linguolabial sounds). Because of the way the airflow out of the vocal tract responds to different kinds of constrictions, degree of constriction is divided into roughly three major categories, with some additional categories due to special timing or gestural properties. Complete constriction in the mouth produces a stop. An incomplete constriction that is small enough to generate turbulent airflow produces a fricative. A constriction that does not produce turbulent airflow is normally classified as an approximant (or, in some systems, a “resonant”). Common exceptions to this basic division include taps, which involve a very fast movement of the tongue tip that does not ensure closure nor the conditions for frication, and affricates, which involve a stop followed by a brief fricative state. For some consonants, the shape of the tongue is also critical to sound production. Some of these are regularly reflected in articulatory descriptions. For example, retroflexed sounds involve curling up of the tongue so the underside of the tip is used to make the constriction. Retroflexion is often treated as a special case of place of articulation. While the majority of consonants are oral, most languages have one or more nasal consonants, so the distinction is reflected in the terms oral/nasal.
For vowels, the common descriptive terms are height, backness, and rounding, and, as noted earlier, the first two are only roughly articulatory. With no specific point of contact or frication production, it is nontrivial to determine the “place” of articulation of a vowel. Furthermore, classification of vowels was until relatively recently accomplished solely by ear, so comparison between vowels has been heavily influenced by their acoustic and auditory properties. Nonetheless, vowels are typically classified by so-called height (minimally high, mid, low) and backness (front, central back), though additional intermediate terms for height are quite common. Languages sometimes employ an additional dimension for vowels, which involves finer distinctions in height and backness that sometimes correlate with tongue root position (advanced tongue root). Vowels are particularly susceptible to influences of neighboring consonants, especially those involving the tongue back and root, and this influence, a coarticulatory effect, often appears in vowel descriptions such as rhoticization (sounding like an r) or pharyngealization (with the low back tongue position typical of a pharyngeal articulation) (see the next section). In addition, vowel articulation can include nasalization plus laryngeal adjustments including variations in mode and rate of vocal fold vibration (see Anatomy for speech sounds).
Descriptive terms and symbols for the sounds they describe are often organized into charts, which provide a quick overview of the descriptive dimensions involved. The International Phonetic Association provides symbol charts and a wealth of additional resources on phonetics and transcription including the Handbook of the IPA. In addition, excellent in-depth discussions of the set of terms needed to describe sounds and distinguish sounds accurately across languages are available (e.g., Catford, 1982; Laver, 1994) as well as the inventory of sounds observed in the world’s languages (Ladefoged & Maddieson, 1996).
5. Speech Dynamism and Variability
The basic literature on articulatory phonetics is dominated by a segmental perspective, in which sounds are described as individual phoneme-sized units that abstract away from the dynamic nature of speech. On the one hand, this is understandable, since articulatory phonetics is often taught in the context of training in transcription, or interpretation of transcriptions, an approach that emphasizes the focus on a segment- or phoneme-sized level of analysis. On the other hand, articulation is clearly predominantly dynamic. Production of speech sounds requires the coordination of multiple articulators, and these articulators are rarely stationary.
See, for example, the short X-ray film illustrating the moving articulators while acoustician Kenneth Stevens produces nonsense items illustrating English consonants and vowels, plus two short sentences. The dynamic nature of speech has always caused tension with segment-oriented descriptive methods such as phonetic transcription and with theories of speech representation that put heavy emphasis on phoneme-sized units. As a phonetics student once put it, “transcription is like describing the world using only integers.” In contrast, in reality, the articulators move nearly continuously, and the way they move varies in accordance with a vast array of factors, from identity of an adjacent sound, to position of a sound within larger linguistics units such as syllables or words, to more global considerations like speech rate or pragmatic context.
Variability in speech sounds can be organized into roughly two general classes of effects, which we may call coordinative and hierarchical (which are similar to, but not equivalent to, the common terms segmental and suprasegmental). Coordinative effects concern the influence of sounds on neighboring sounds in the same word or phrase. These may also be thought of as a kind of syntagmatic effect. Hierarchical effects concern differences in the production of the “same” sound in different positons in linguistic units such as the word or syllable, or various types of prosodic units (Frota, 2012). Effects of stress will also be included in this category, since stress involves relative differences in strength or prominence akin to prosodic domain effects.
Considering coordinative effects first, we can see that since successive sounds are produced with the same set of articulators, in different combinations, and the articulators move more or less continuously, rather than in discrete steps, it follows that the production of one sound will be influenced by production of neighboring sounds. While making a sound, we begin preparing for the next sound, and likewise the articulation of one sound will affect the way the articulators move toward a following sound. The most general term for this is coproduction, though a more common alternative is coarticulation. Sometimes a distinction is drawn between coarticulation, conceived of as described above, and assimilation, which is taken to be a discrete, segment-changing, phonological effect. The term coarticulation, then, evokes a theoretical distinction between phonetic facts and phonological facts that in practice can be hard to maintain (see, e.g., Scobbie, 2007). In either case, the key insight is that there is overlap, or influence, between neighboring sounds. Commonly acknowledged examples of coproduction include nasalization of a vowel preceding a nasal consonant (compare the vowels in “bead” and “bean”) and the rounding of the lips during a consonant that precedes a rounded vowel (compare “seat” and “suit”). There is even evidence that in certain cases these effects can extend beyond immediately adjacent sounds, as in longer-distance anticipation of retroflexion later in a phrase (West, 1999). Another kind of local coproduction effect is articulator shift, as when a velar is produced more front before a front vowel (compare “keep” and “coop”) or when a consonant shifts its place of articulation under influence of an adjacent consonant made with the same articulator (compare the “n’s” in “tenth” and “lunch”). Crucially, these and other co-articulatory effects have been found to differ across languages (Cohn, 1990; Beddor, 2009), raising the question of how this interaction between nearby sounds is learned, controlled, and represented.
Hierarchical effects reflect variations in articulation related to the position of a sound in a higher-order linguistic structure. Some of these effects have been codified as recognized allophones, such as aspiration occurring on word-initial voiceless stops in English or the strong tongue backing gesture referred to as velarization in syllable codas, which distinguishes the “l” in “fell” from the “l” in “left.” Others involve more subtle differences, some of which are readily detected and are sometimes noted in impressionistic phonetic descriptions, such as a tendency to lengthen segments before phrase boundaries (e.g., Wightman, Shattuck-Hufnagel, Ostendorf, & Price, 1992; Byrd & Saltzman, 2003). Others are more subtle, such as the increase in articulatory contact and/or duration for “t” in utterance initial position reported by Cho and Keating (2001) for Korean and by Cho and Keating (2009) for English. There is also evidence that these articulatory effects produce acoustic differences that listeners can use in word segmentation (e.g., McQueen & Cho, 2003). Prosodic effects on articulation vary across different segments and different languages (Cho, 2015). See also Cho, Grice, and Mücke (2014), a recent special issue of the Journal of Phonetics (2014) on Dynamics of articulation and prosodic structure.
Effects of word stress or phrasal accent can also be characterized as hierarchical effects, since the domain of both involves stretches of speech larger than an individual sound, though a better term, especially for these, might simply be nonlocal effects. Influences of stress on articulation include increase in consonant closure duration and increase in vowel openness (Cho & Keating, 2009). Furthermore, Beckman, Edwards, and Fletcher (1992) found a decrease in vowel–consonant gesture overlap for phrase-accented vowels. Speech rate, like stress and accent, also affects larger stretches of sounds, though rate effects are not generally associated with specific linguistic units to the exclusion of others. Speech rate may change extent or speed of gestures, including lip or jaw velocity or displacement (Lindblom, 1963; Beckman et al., 1992; Moon & Lindblom, 1994; Shaiman, 2001), or it may affect intergestural timing. For example, higher speech rate can increase overlap of labial and tongue gestures (Engstrand, 1988; see also Munhall & Löfqvist, 1992).
Theorists debate how much of the dynamicity of articulation is critical to linguistic representations. Since this debate cannot be resolved here, we have maintained the common segmental perspective as a means of organizing discussion of the essentially variable nature of speech, which is central to the core research questions in articulatory phonetics. (For further discussion of articulatory detail in linguistic representations, see Theories addressing articulation and linguistic representation.)
6. Research in Articulatory Phonetics
A diversity of approaches is used to develop informative accounts of the variability in speech sounds and to tease apart the physical and the linguistic contributions to the articulatory fabric of sounds in the world’s languages. As noted above, research in a wide variety of fields bears on the questions of concern in articulatory phonetics. Foundational work in the field has been done, for example, by speech scientists, linguists, psychologists, and electrical engineers, to name a few. At the concrete anatomical level, questions of interest include the range of variability of human vocal tracts, the movements that may be considered more natural or easier to produce, and any limiting conditions on articulator coordination. At a slightly more abstract level, key questions are concerned with how speech production is planned and controlled and the role of linguistic structures in determining the course of articulation. Research in recent years has even considered how speech is coordinated with nonspeech gesture, including facial movements, hand and arm gestures, and even torso and body movements (e.g., Esteve-Gibert, Pons, Bosch, & Prieto, 2014; Wagner, Malisz & Kopp, 2014; and the February 2014 special issue of Speech Communication on gesture and speech in interaction). Here we will briefly survey research methods in the field, including data collection methods, models, and theories pertaining particularly to linguistic representation of, and control of, speech production.
6.1. Sources of data
Research on speech articulation must contend with a variety of linguistic, social, and situational variables that can affect the data gathered. Traditionally, data was gathered in a laboratory setting, with subjects pronouncing words in isolation from a word list or possibly in a sentence frame to regularize the effects of stress and intonation which otherwise tend to alternate in a word list. As more compact signal processing devices have been developed, it has been easier to collect data outside the lab, even in rural and remote areas. Ladefoged (2003) and Bird and Gick (2005) describe techniques for doing phonetic fieldwork (see also the next section). Speech style is still a critical variable, with researchers debating the relative merit of “laboratory speech” over spontaneous conversation or the popular compromise: task-oriented interactive dialogue (e.g., the MapTask). Over time, more attention has been directed at controlling (or directly studying) individual sociolinguistic variables, such as dialect, race, class, and sexual orientation (Cohn & Huffman, 2014; Docherty & Mendoza-Denton, 2012; Warren & Hay, 2012). Beyond these considerations, at the linguistic level, the sorts of analytic comparisons made include comparing similar sounds across multiple languages, comparing the “same” sounds in multiple linguistic contexts, and examining the same sounds in multiple communicative situations, from spontaneous dialogue to imitated speech and even elicited speech errors.
6.2. Instruments for data collection
A variety of instrumental techniques are currently available to study articulation. Since acoustic recordings are the least invasive and the easiest type of data to collect, much work in this area has been based on inference from analysis of audio data. For languages spoken in areas far from speech laboratories, the range and type of data beyond acoustic recordings that can be collected are somewhat limited, though articulatory and aerodynamic data are sometimes obtained. X-rays in the past provided good resolution of individual articulators and information about articulatory movements in a two-dimensional space, which could be associated with a parallel acoustic recording. Efforts to develop safe alternatives have made considerable progress. Ultrasound can reveal the shape and movement of most of the tongue (constrained somewhat by individual anatomical variability), and simultaneous audio recordings can be made. MRI can resolve most speech structures very well, but good-quality simultaneous audio is not possible due to machine noise, and the speaker must be lying down. Pellet tracking systems such as electromagnetic midsagittal articulometry (EMA or EMMA; e.g., Perkell et al., 1992) provide good temporal resolution of speech gestures and allow for simultaneous audio recording. For review of articulator imaging and tracking techniques, see Davidson (2012) and contributions in Harrington and Tabain (2006).
More detailed information about tongue contact location, extent, and timing can be gathered using electropalatography, in which a retainer-like device embedded with electrodes is held in the mouth during speech. At present such devices can only be used with a very limited number of subjects, as the retainer must be fit to the individual speaker (check whether this has improved), but they do have the advantage of providing time-varying data at a decent rate of resolution and good-quality simultaneous audio. For static samples of overall tongue contact, static palatography can be used, in which a dark substance is “painted” on the tongue or palate; after articulation of a sound, the contact pattern can be observed by determining where the substance was transferred to the other articulator (Dart, 1998; Anderson, 2008). See also Ladefoged (2003) for description of this technique as well as aerodynamic and acoustic recording techniques for a fieldwork setting.
Aerodynamic data can be used to infer changes in velum position and the positions of the vocal folds. Estimation of velum position, or, more accurately, estimated velopharyngeal port aperture, comes by a number of means. One is to measure nasal airflow on its own; alternatively, one can measure both nasal and oral airflow, with a split mask, and then using oral airflow as a way to estimate more accurately which changes in nasal airflow are due to velopharyngeal port opening and which are due to oral constriction (which would favor higher nasal airflow independent of changes in velopharyngeal port opening). For more information on measuring nasality, see Krakow and Huffman (1992). Inverse filtering is a process by which the glottal waveform is deduced by estimating the contribution of the vocal tract to the volume velocity output, as registered with an airflow mask, and then “subtracting” that from the output spectrum, with the remaining spectral components being considered the contribution of movements at the larynx. Analysis by synthesis or other types of articulatory to acoustic modeling can then estimate the shape of the glottal flow waveform, which in turn gives an indication of the pattern of vocal fold vibration. Vocal fold function can also be assessed using electroglottography (EGG), in which electrodes on either side of the thyroid cartilage measure the current passing through the larynx, which will be higher with contact with more laryngeal tissue. EGG can be used to estimate the mode of vibration of the vocal folds and to assess phonation type differences such as difference between normal voice and so-called creaky voice, which is used contrastively in some languages and is a prosodic element marking phrase boundaries and/or low tone in many languages (e.g., Redi & Shattuck-Hufnagel, 2001; Hanson, 2012).
6.3. Articulatory modeling
The goal of articulatory modeling is to understand the way in which articulators are coordinated in production of speech, including how they are used together to achieve speech production targets and how their preferred modes of movement and coordination might delimit the nature of human speech sounds, including what are common sounds and sound combinations. (Löfqvist, (2010) provides an overview of issues in speech production modeling.) The larynx and tongue, as the most complex moving articulators, have been modeled using many methods. Models of the larynx range from one-dimensional models to those with as many as 16 degrees of freedom (e.g., Fujita, Dang, Suzuki, & Honda, 2007; see also papers in Redford, 2015). Models of the tongue now include time-varying representations based on flesh point data or sagittal tracings from X-rays or MRI images and representations of tissue and muscle properties (e.g., Sanguineti, Laboissiere, & Payan, 1997; see also papers in Harrington & Tabain, 2006, and a review of data acquisition and modeling techniques in Hiiemae & Palmer, 2003). Three-dimensional mechanical models such as Artisynth have reached a level of sophistication and accessibility great enough that they can contribute to tests of specific hypotheses about speech articulators and preferred modes of movement, thus providing for major advances in answering questions about which aspects of speech are motivated by properties of the articulatory system. Stavness, Gick, Derrick and Fels (2012) is an example of articulatory modeling applied to the question of explaining common gestural coordination patterns, in this case the tendency of different English “r” variants to co-occur with specific vowels. Ultimately, models must account for essential aspects of speech articulation and how they generate the acoustic waveform that is the outcome of human speech. Here we briefly mention a few models that produce acoustic output and have been used in linguistic research. HLSYN (by Sensimetrics) is a parametrized system for driving a formant synthesizer. It is quasi-articuatory in that the values for some of the synthesis control parameters are constrained by considering dependencies between them that result from vocal tract structure and speech aerodynamics and acoustics (a brief overview is included in, e.g., Hanson & Stevens, 2002). TADA (from Haskins Laboratory) is a system for generating synthesizer control variables from gestural “scores” containing information about articulator, articulator properties, and movement goals. It is an implementation of the representations posited in the theory of articulatory phonology (see Theories addressing articulation and linguistic representation). The output of TADA produces an acoustic waveform through formant synthesis via HLSYN.
A range of influential theories have addressed different subsets of this set of questions, as will be reviewed in the next section.
6.4. Theories addressing articulation and linguistic representation
As discussed above, one challenge to theories concerned with articulatory phonetics is the vast variability in what are taken to be the “same” speech sound. The vowel in “cat” will vary in height and frontness across dialects, but it will also vary somewhat from the vowel in “can” because of anticipation of the nasal consonant. Furthermore, it will also vary depending on speech rate, speech style, and more personal physical properties of the speaker, from age to head size to sex. Among all this variability, how do languages encode a consistent system relating sounds to meaning? If some of the variability can be explained by language-external principles, then this leaves less that must be encoded in linguistic representations. Some of the highly influential theories in this area provide explanations for patterns in phonetic variation. One such theory is Lindblom’s (1990) hyperarticulation and hypoarticulation (H&H) theory, which holds, roughly, that for a given articulatory target (which is an abstraction, since it is a target, not an achieved position), a general speech mode parameter of strength of articulation will determine whether that target is reached, exceeded, or “undershot.” It is implicit in this view that an articulator’s starting position will also contribute to how the articulatory target is achieved, so this theory does acknowledge the essentially coordinative nature of speech. Another influential theory, Stevens’s quantal theory (Stevens, 1972), focuses more on the acoustic output and nonlinearities in the articulatory to acoustic transduction process. The key insight here is that for certain combinations of articulators, precision in articulator positioning is not needed, because small differences in degree or place of articulation have little acoustic effect, thus generating regions of stability in the articulatory–acoustic relation. Conversely, between these regions of stability, there are regions where a small articulatory change has a big acoustic effect. In sum, quantal theory provides an explanation for two important properties of an information conveyance system: strong markers of boundaries, or distinctions, between units of the code and consistency in the physical representation of individual units (see also Iskarous, 2012).
As just demonstrated, much theorizing in articulatory phonetics is founded on the assumption that linguistic representations are abstract symbolic or featural codes that are then “implemented” by the speech articulators. The mapping between these discrete representational units and the essentially continuously changing world of articulators and sound waves is then a primary theoretical challenge. Which aspects of this mapping are automatic, because of the way that articulators work, the aerodynamic consequences of articulator function, and the basic physics of speech, and which aspects are under linguistic control? For those aspects under linguistic control, does the envisioned abstract code contain all the necessary information to provide an account of the linguistically important aspects of the output? As decades of research documenting phonetic properties of individual languages amassed, it became clear that the amount of individual phonetic detail that differed between languages could not be coded in the sorts of phonemic or featural representations normally envisioned as being part of the grammar. (See also Cohn and Huffman 2014).
Different solutions to this problem have been developed. One solution is to recognize a separate “phonetic” component which embellishes the basic abstract “phonological” representation with detail about specific phonetic targets and allowable variation for a language. For example, the phonetic targets for /a/in English and Spanish would be different (Spanish /a/ is fronter than most/a/’s in American English) and the range of acceptable variation would also be different (English tolerates more stress conditioned spectral variability in vowels than Spanish does). This general approach is often referred to as “phonetic implementation,” where a phonological representation is “realized” through the addition of phonetic detail. The result is still normally seen as a set of somewhat abstract targets which must be converted to an articulatory plan. Another solution to this problem is to envision linguistic representations as manipulating articulatorily relevant parameters from the start, thus circumventing the discrete representation-to-continuous movement translation problem. The theory of articulatory phonology (e.g., Browman & Goldstein, 1992) holds that linguistic representations are articulatory representations, described as gestural scores, which include information about target degree and location of constrictions, as well as information about coordination of different articulators and information about articulator kinematics, such as gesture stiffness. The phonetic variations previously described as changes in feature values, addition of feature specifications, deletions, effects of syllable structure, stress, etc. are all hypothesized to have a source within the speech plan, or gestural score, combined with a model of how movement arises from the score. Hypothesized gestural scores can be used by task dynamic model of articulator movement (e.g., Saltzman & Kelso, 1987),combined with an articulatory synthesizer (e.g., Rubin et al., 1981) to produce outputs that can then be tested against articulatory data in an analysis by synthesis approach.
Articulatory phonology has provided novel and insightful accounts of a variety of phenomena. For example, many types of sound variation that had been treated previously as cases of deletion or discrete change in phonological features have been shown to be accounted for by reference to changes in gestural coordination or articulatory movement variables. One traditional example of this is the phrase “perfec(t) memory,” which English speakers can, and often do, say without an audible [t] (Browman & Goldstein, 1991). This had been described as deletion of a/t/phoneme, but in reality, many productions of this phrase involve a hidden or low-amplitude tongue movement for/t/. Many cases of sound “change” in differing prosodic conditions, or changes associated with speech style changes, also find a natural account in terms of changes in dynamical properties of gestures, including degree and timing of gestural overlap and gesture amplitude (Browman & Goldstein, 1992). Cross-linguistic differences in fine phonetic detail are not a separate or secondary problem in articulatory phonology. The notion that the “same” sound is phonetically different in different languages is reflected simply in the fact that the two languages employ the same articulator, and languages can vary in their target constriction locations and degrees, with properties of the articulators defining the boundary constraints on what is possible. On the other hand, perceptual effects on speech sound inventories, or on individual articulation of speech sound, do not have a natural account in articulatory phonology. These challenges are more evident with vowels and resonants like English “r,” for which it seems that individual speakers can combine a variety of articulatory movements to achieve what may be an acoustic or perceptual goal rather than a clear articulatory goal (e.g., Nieto-Castanon, Guenther, Perkell, & Curtin, 2005).
7. Future Directions
Articulatory phonetics gains insights from, and contributes to, a diverse set of domains of knowledge, from psycholinguistics to the physics of singing. As such, developments in a wide variety of fields can bring new progress in addressing the core questions of the field, which concern the way speech is planned and executed and the way that properties of the speech mechanism contribute to the articulatory and acoustic fabric of the sounds of the world’s languages. The relatively rapid pace of improvements in articulator and brain imaging techniques currently taking place promises to provide substantially larger and richer sets of data in the near future. This progress, along with continued advances in modeling of articulation and neural control of speaking, promise to make possible the investigation of more specific and revealing hypotheses about the nature of spoken language.
Links to Digital Materials
Bird, S., & Gick, B. (2005). Phonetics: Field methods. In K. Brown (Ed.), Encyclopedia of language and linguistics (2d ed., Vol. 9. pp. 463–467). Oxford: Elsevier.Find this resource:
Browman, C., & Goldstein, L. (1992). Articulatory phonology: An overview. Phonetica, 49(3–4), 155–190.Find this resource:
Catford, J. C. (1982). Fundamental problems in phonetics. Edinburgh: Edinburgh University Press.Find this resource:
Gick, B., Derrick, D., & Wilson, I. (2013). Articulatory phonetics. Malden, MA: Wiley-Blackwell.Find this resource:
Harrington, J., & Tabain, M. (2006). Speech production: Models, phonetic processes, and techniques. New York: Psychology Press.Find this resource:
Ladefoged, P. (2003). Phonetic data analysis. Malden, MA: Wiley-Blackwell.Find this resource:
Ladefoged, P., & Maddieson, I. (1996). The sounds of the world’s languages. Cambridge, MA: Blackwell.Find this resource:
Redford, M. (2015). Handbook of speech production. Malden, MA: Wiley-Blackwell.Find this resource:
Anderson, V. B. (2008). Static palatography for language fieldwork. Language Documentation & Conservation, 2(1), 1–27.Find this resource:
Beckman, M. E., Edwards, J., & Fletcher, J. (1992). Prosodic structure and tempo in a sonority model of articulatory dynamics. In G. J. Docherty & D. R. Ladd (Eds.), Papers in laboratory phonology II: Segment, gesture, prosody (pp. 68–86). Cambridge, U.K.: Cambridge University Press.Find this resource:
Beckman, M. E., & Kingston, J. (2012). In A. C. Cohn, C. Fougeron, & M. K. Huffman (Eds.), The Oxford handbook of laboratory phonology (pp.10–16). Oxford: Oxford University Press.Find this resource:
Beddor, P. (2009). A coarticulatory path to sound change. Language, 85(4), 785–821.Find this resource:
Browman, C., & Goldstein, L. (1991). Tiers in articulatory phonology, with some implications for casual speech. In J. Kingston & M. E. Beckman (Eds.), Papers in laboratory phonology I: Between the grammar and the physics of speech (pp. 341–376). Cambridge, U.K.: Cambridge University Press.Find this resource:
Browman, C., & Goldstein, L. (1992). Articulatory phonology: An overview. Phonetica, 49(3–4), 155–190.Find this resource:
Byrd, D., & Saltzman, E. (2003). The elastic phrase: Modeling the dynamics of boundary-adjacent lengthening. Journal of Phonetics, 31, 149–180.Find this resource:
Catford, J. C. (1982). Fundamental problems in phonetics. Edinburgh: Edinburgh University Press.Find this resource:
Cho, T. (2015). Language effects on timing at the segmental and suprasegmental levels. In M. A. Redford (Eds.), The handbook of speech production (pp. 505–529). Hoboken, NJ: Wiley-Blackwell.Find this resource:
Cho, T., Grice, M., & Mücke, M. (Eds.) (2014). Dynamics of Articulation and Prosodic Structure. Special issue of Journal of Phonetics44.Find this resource:
Cho, T., & Keating, P. (2001). Articulatory and acoustic studies of domain-initial strengthening in Korean. Journal of Phonetics, 29, 155–190.Find this resource:
Cho, T., & Keating, P. (2009). Effects of initial position versus prominence in English. Journal of Phonetics, 37(4), 466–485.Find this resource:
Cohn, A. C. (1990). Phonetic and phonological rules of nasalization. UCLA Working Papers in Phonetics 76. Los Angeles: UCLA.Find this resource:
Cohn, A., Fougeron, C., & Huffman, M. (Eds.). (2012). The Oxford handbook of laboratory phonology. Oxford: Oxford University Press.Find this resource:
Cohn, A., & Huffman, M. (2014). The interface between phonology and phonetics. Oxford Bibliographies Online. Oxford: Oxford University Press.Find this resource:
Dart, S. N. 1998. Comparing French and English coronal consonant articulation. Journal of Phonetics, 26, 71–94.Find this resource:
Davidson, L. (2012). Ultrasound as a tool for speech research. In A. C. Cohn, C. Fougeron, & M. K. Huffman (Eds.), The Oxford handbook of laboratory phonology (pp. 484–496). Oxford: Oxford University Press.Find this resource:
Docherty, G., & Mendoza-Denton, N. (2012). Speaker-related variation. In A. C. Cohn, C. Fougeron, & M. K. Huffman (Eds.), The Oxford Handbook of laboratory phonology (pp. 44–60). Oxford: Oxford University Press.Find this resource:
Engstrand, O. (1988). Articulatory correlates of stress and speaking rate in Swedish VCV utterances. Journal of the Acoustical Society of America, 83(5), 1863–1875.Find this resource:
Esteve-Gibert, N., Pons, F., Bosch, L., & Prieto, P. (2014). Are gesture and prosodic prominences always coordinated? Evidence from perception and production. In N. Campbell, D. Gibbon, & E. Hirst (Eds.), Proceedings of the Speech Prosody 2014. Dublin.Find this resource:
Frota, S. (2012). Prosodic structure, constituents, and their implementation. In A. C. Cohn, C. Fougeron, & M. K. Huffman (Eds.), The Oxford handbook of laboratory phonology (pp. 255–265). Oxford: Oxford University PressFind this resource:
Fujita, S., Dang, J., Suzuki, N., & Honda, K. (2007). A computational tongue model and its clinical application. Oral Science International, 4, 97–109.Find this resource:
Gick, B., Wilson, I., & Derrick, D. (2013). Articulatory phonetics. Malden, MA: Wiley-Blackwell.Find this resource:
Hall, D., & Mielke, J. (2011). Distinctive Features. In Oxford Bibliographies in Linguistics. Oxford: Oxford University Press.Find this resource:
Hanson, H. (2012). Methodologies used to investigate laryngeal function and aerodynamic properties of speech. In A. C. Cohn, C. Fougeron, & M. K. Huffman (Eds.), The Oxford handbook of laboratory phonology (pp. 496–511). Oxford: Oxford University Press.Find this resource:
Hanson, H., & Stevens, K. (2002). A quasi-articulatory approach to controlling acoustic source parameters in a Klatt-type formant synthesizer using HLsyn. Journal of the Acoustical Society of America, 112(3), 1158–1182.Find this resource:
Harrington, J., & Tabain, M. (2006). Speech Production: Models, Phonetic Processes, and Techniques. New York: Psychology Press.Find this resource:
Hiiemae, K., & Palmer, J. (2003). Tongue movements in feeding and speech. Critical Reviews in Oral Biology & Medicine, 14, 413–429.Find this resource:
Huffman, M. (2011). Articulatory phonetics. In Oxford Bibliographies in Linguistics. Oxford: Oxford University Press.Find this resource:
Huffman, M., & Cohn, A. (2014). The interface between phonology and phonetics. In Oxford Bibliographies in Linguistics. Oxford: Oxford University Press.Find this resource:
Iskarous, K. (2012). Articulatory to acoustic modeling. In A. C. Cohn, C. Fougeron, & M. K. Huffman (Eds.), The Oxford handbook of laboratory phonology (pp. 472–483). Oxford: Oxford University Press.Find this resource:
Krakow, R., & Huffman, M. (1992). Instruments and techniques for investigating nasalization and velopharyngeal function in the laboratory: An introduction. In M. Huffman & R. Krakow (Eds.), Phonetics and phonology 5: Nasals, nasalization and the velum (pp. 3–59). San Diego, CA: Academic Press.Find this resource:
Ladefoged, P. (2003). Phonetic data analysis. Malden, MA: Wiley-Blackwell.Find this resource:
Ladefoged, P., & Maddieson, I. (1996). The sounds of the world’s languages. Cambridge, MA: Blackwell.Find this resource:
Laver, J. (1994). Principles of phonetics. Cambridge, U.K.: Cambridge University Press.Find this resource:
Lindblom, B. (1963). Spectrographic study of vowel reduction. Journal of the Acoustical Society of America, 35(11), 1773–1781.Find this resource:
Lindblom, B. (1990). Explaining phonetic variation: A sketch of the H & H theory. In W. J. Hardcastle & A. Marchal (Eds.), Speech production and speech modeling (pp. 403–439). Dordrecht, The Netherlands: Kluwer.Find this resource:
Löfqvist, A. (2010). Theories and models of speech production. In W. J. Hardcastle, J. Laver, & F. E. Gibbon (Eds.) The handbook of phonetic sciences (2d ed., pp. 353–357). Oxford, U.K.: Blackwell.Find this resource:
Maddieson, I. (1984). Patterns of sounds. Cambridge, U.K.: Cambridge University Press.Find this resource:
McQueen, J., & Cho, T. (2003). The use of domain-initial strengthening in segmentation of continuous English speech. In M. J. Solé, D. Recasens, & J. Romero (Eds.). Proceedings of the 15th International Congress of Phonetic Sciences (ICPhS). Adelaide, Australia: Causal Productions, 2003.Find this resource:
Moon, S., & Lindblom, B. (1994). Interaction between duration, context, and speaking style in English stressed vowels. Journal of the Acoustical Society of America, 96(1), 40–55.Find this resource:
Munhall, K., & Löfqvist, A. (1992). Gestural aggregation in speech: laryngeal gestures. Journal of Phonetics, 20, 111–126.Find this resource:
Nieto-Castanon, A., Guenther, F. H., Perkell, J. S., & Curtin, H. D. (2005). A modeling investigation of articulatory variability and acoustic stability during American English/r/production. The Journal of the Acoustical Society of America, 117(5), 3196–3212.Find this resource:
Perkell, J. S., Cohen, M. H., Svirsky, M., A., Matthies, M. L., Garabieta, I., & Jackson, M. T. T. (1992). Electromagnetic midsagittal articulometer systems for transducing speech articulatory movements. Journal of the Acoustical Society of America, 92, 3078–3096.Find this resource:
Redford, M. (2015). Handbook of speech production. Malden, MA: Wiley-BlackwellFind this resource:
Redi, L., & Shattuck-Hufnagel, S. (2001). Variation in realization of glottalization in normal speakers. Journal of Phonetics, 29, 407–429.Find this resource:
Reetz, H., & Jongman, A. (2011). Phonetics: Transcription, production, acoustics, and perception. Malden, MA: Wiley-Blackwell.Find this resource:
Rubin, P., Baer, T., & Mermelstein, P. (1981). An articulatory synthesizer for perceptual research. Journal of the Acoustical Society of America, 70(2), 321–328.Find this resource:
Saltzman, E., & Kelso, J. A. S. (1987). Skilled actions: A task dynamic approach. Psychological Review, 94(1), 84–106.Find this resource:
Sanguineti, V., Laboissiere, R., & Payan, Y. (1997). A control model of human tongue movements in speech. Biological Cybernetics, 77(1), 11–22Find this resource:
Scobbie, J. M. (2007). Interface and overlap in phonetics and phonology. In G. Ramchand & C. Reiss (Eds.), Oxford handbook of linguistic interfaces, (pp. 17–52). Oxford: Oxford University Press.Find this resource:
Shaiman, S. (2001). Kinematics of compensatory vowel shortening: the effect of speaking rate and coda composition on intra- and inter-articulatory timing. Journal of Phonetics, 29, 89–107.Find this resource:
Stavness, I., Gick, B., Derrick, D., & Fels, S. (2012). Biomechanical modeling of English /r/ variants. J. Acoust. Soc. Am. 131: EL355–360.Find this resource:
Stevens, K. N. (1972). The quantal nature of speech: Evidence from articulatory-acoustic data. In P. B. Denes & E. E. David Jr. (Eds.), Human communication: A unified view (pp. 51–66). New York: McGraw Hill.Find this resource:
Wagner, P., Malisz, Z., & Kopp, S. (2014). Gesture and speech in interaction: An overview. Speech Communication, 57, 209–232.Find this resource:
Warren, P., & Hay, J. (2012). Methods and experimental design for studying sociophonetic variation. In A. C. Cohn, C. Fougeron, & M. K. Huffman (Eds.), The Oxford handbook of laboratory phonology (pp. 634–642). Oxford: Oxford University Press.Find this resource:
West, P. (1999). Perception of distributed coarticulatory properties of English/l/and/r/. Journal of Phonetics, 27(4), 405–426.Find this resource:
Wightman, C. W, Shattuck-Hufnagel, S., Ostendorf, M., & Price, P. (1992). Segmental durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical Society of America, 91, 1707–1717.Find this resource: