Summary and Keywords
Computational semantics performs automatic meaning analysis of natural language. Research in computational semantics designs meaning representations and develops mechanisms for automatically assigning those representations and reasoning over them. Computational semantics is not a single monolithic task but consists of many subtasks, including word sense disambiguation, multi-word expression analysis, semantic role labeling, the construction of sentence semantic structure, coreference resolution, and the automatic induction of semantic information from data.
The development of manually constructed resources has been vastly important in driving the field forward. Examples include WordNet, PropBank, FrameNet, VerbNet, and TimeBank. These resources specify the linguistic structures to be targeted in automatic analysis, and they provide high-quality human-generated data that can be used to train machine learning systems. Supervised machine learning based on manually constructed resources is a widely used technique.
A second core strand has been the induction of lexical knowledge from text data. For example, words can be represented through the contexts in which they appear (called distributional vectors or embeddings), such that semantically similar words have similar representations. Or semantic relations between words can be inferred from patterns of words that link them. Wide-coverage semantic analysis always needs more data, both lexical knowledge and world knowledge, and automatic induction at least alleviates the problem.
Compositionality is a third core theme: the systematic construction of structural meaning representations of larger expressions from the meaning representations of their parts. The representations typically use logics of varying expressivity, which makes them well suited to performing automatic inferences with theorem provers.
Manual specification and automatic acquisition of knowledge are closely intertwined. Manually created resources are automatically extended or merged. The automatic induction of semantic information is guided and constrained by manually specified information, which is much more reliable. And for restricted domains, the construction of logical representations is learned from data.
It is at the intersection of manual specification and machine learning that some of the current larger questions of computational semantics are located. For instance, should we build general-purpose semantic representations, or is lexical knowledge simply too domain-specific, and would we be better off learning task-specific representations every time? When performing inference, is it more beneficial to have the solid ground of a human-generated ontology, or is it better to reason directly with text snippets for more fine-grained and gradual inference? Do we obtain a better and deeper semantic analysis as we use better and deeper manually specified linguistic knowledge, or is the future in powerful learning paradigms that learn to carry out an entire task from natural language input and output alone, without pre-specified linguistic knowledge?
1. Phenomena and Tasks
Computational semantics is an area of computational linguistics that is about automatically analyzing the meaning of natural language. Research topics in computational linguistics include the design of appropriate meaning representations, the automatic analysis of text into those meaning representations, and reasoning over meaning representations. Computational semantics includes analyses at the level of individual words, phrases, sentences, and stretches of discourse, with a wide variety of methods used. In order to keep the article reasonably focused, it has been restricted to the levels of lexical meaning and sentence meaning.
Even with this restriction, there is a wide variety of phenomena that are involved in the construction of meaning representations. Some of them are at the lexical level; they concern the meanings of words or short expressions and the relations between them. Others are at the sentence level and concern the overall structure of a sentence and the way its parts are linked. The aim of this section is to sketch some important phenomena at both the lexical level and the sentence level, to point out what makes them difficult, and to list core computational tasks linked to each phenomenon.
1.1 Lexical Phenomena
Table 1 shows examples of some phenomena at the lexical level. As panel (A) of the table shows, a word can have more than one sense; it can be polysemous, like star in (1): This sentence must refer to the “well-known person” sense of star (as it talks about the star getting married), but the word astronomer cues the “celestial object” sense. A word is called a homonym if its senses are completely unrelated, as in the case of bat, and polysemous if it has multiple related senses. In some cases, it is difficult to decide whether two uses should be one sense or two (Kilgarriff, 1997): Should (2) and (3) count as two separate senses, meaning demonstrate versus exhibit, or should they both be grouped under a broader demonstrate sense? Additionally, sense relatedness seems to be a matter of degrees, with no clear boundary between closely related and less closely related senses (Cruse, 1995; Brown, 2008). Word sense disambiguation is the task of deciding which sense, out of a given list of dictionary senses, applies to a particular occurrence of a word (McCarthy, 2009; Navigli, 2009). One of the oldest tasks in the field (Weaver, 1949), it is still a hard one. In some cases, it can be advisable to identify only the predominant sense for a domain (McCarthy et al., 2004). Another prominent task related to word sense is word sense induction, learning word senses from text data (Schütze, 1998).
Table 1: Examples of Phenomena at the Lexical Level
Word senses stand in semantic relations (Fellbaum, 1998), as exemplified in panel (B): the words star and champion are synonyms—or rather near-synonyms, as there are nearly no two words that share all aspects of their meanings (Edmonds & Hirst, 2002). Relations usually hold between senses rather than words; the word star in the sense of champion has person as its hypernym, while star in the sense of a celestial body does not. An important computational task is to induce semantic relations from data (Hearst, 1992).
Individual words are not the only lexical items that need to be taken into account. The number of multiword expressions may actually be on the same order of magnitude as that of words (Jackendoff, 1997). Baldwin and Kim (2010) define multiword expressions as “lexical items that: (a) can be decomposed into multiple lexemes; and (b) display lexical, syntactic, semantic, pragmatic and/or statistical idiomaticity.” Panel (C) has a few examples. Multiword expressions are a heterogeneous group that includes light-verb expressions as in (4) (where the main predicate is the noun), collocations like (5), idioms like spill the beans, and noun compounds like mountain stream (Ramisch et al., 2013). They have famously been called a “pain in the neck for NLP” (Sag et al., 2002), because there are very many of them, and they vary widely in how decomposable they are—for example kick the bucket, when passivized, seems infelicitous. For spill the beans, passivized occurrences are rare but exist (the example in (6) is from an online message board). Some multiword expressions are completely opaque, like kick the bucket or by and large. On the other hand, the meaning of light-verb expressions like (4) can be inferred, but which verb serves as light verb for which noun has to be memorized. One computational problem involving multiword expressions is how to represent them in a principled fashion, in spite of their apparent messiness, in a formal representation of sentence semantics (Sag et al., 2002). Another question has been how to detect collocations automatically, using some notion of association or likelihood of co-occurrence (Choueka, 1988; Evert, 2005), expression length, and context specificity (Frantzi et al., 1998). A third question is how to automatically interpret multiword expressions via paraphrasing, e.g. in Nakov and Hearst (2013) or Shutova (2013).
Panel (D) shows some examples of metonymy, where a word or phrase is used to stand in for a related concept. In (7), Wall Street stands for people associated with Wall Street. Sentence (8) is an example of logical metonymy. In logical metonymy, a noun phrase or adjective, here the noun phrase the book, needs to be interpreted as an event. In this case, relevant events could be reading or writing the book. Tasks include deriving a principled account of this phenomenon (Pustejovsky, 1995) and automatically learning to interpret instances of logical metonymy (Lapata & Lascarides, 2003).
In metaphors, illustrated in panel (E), properties and entailments from some source domain are projected to a target domain.1 As Lakoff and Johnson (1980) argued, metaphor is not restricted to creative uses, nor is it restricted to individual metaphoric expressions. A single metaphor can give rise to many expressions, as in sentences (9) and (10), which are both instances of the metaphor “life is a journey.” Metaphoric expressions pose difficulties for automatic analysis. While some metaphoric expressions are listed as separate senses in some dictionaries (for example, the “person” sense of star in (1), which is metaphoric, is listed in WordNet), many are not. So it is necessary first to do metaphor identification, to tag expressions that are metaphoric, and then to interpret them, for example through paraphrasing (Shutova, 2015; Veale et al., 2016).
Paraphrasing (panel (F) in Table 1) is in a way the mirror image of polysemy: instead of a term with multiple meanings, we have multiple terms with the same or similar meanings. In text, there are usually many different ways to convey the same or almost the same meaning. Sometimes a word or phrase can be paraphrased by a single other word, as in (11), and sometimes by multiple words as in (12). Paraphrases sometimes stand in a semantic relation to the original word, for example, synonymy, hypernymy, hyponymy, or meronymy (the part-whole relation). But when people produce paraphrases, they are often not in any standard relation to the original word (Kremer et al., 2014), as example (11) illustrates: show and demonstrate are synonyms, but teach does not stand in any standard semantic relation to show. The possible meanings of a word can also be described in terms of paraphrases. In that case, each occurrence of a word can have its own mixture of paraphrases, while meaning descriptions in terms of dictionary senses have a fixed set of word meanings that all occurrences have to use (McCarthy & Navigli, 2009). Paraphrases for words and phrases can be learned automatically from corpus data (Lin & Pantel, 2001; Ganitkevitch et al., 2013; Berant et al., 2015). A paraphrase applies to a particular sense of a word or phrase, so it is also necessary to check how well a paraphrase fits in a particular sentence context (Erk & Padó, 2008; Szpektor et al., 2008; Szarvas et al., 2013).
Semantic roles (panel (G)) express the ways in which the arguments of a predicate can be involved in the event that it describes (Fillmore, 1968). In sentences (13) and (14), the predicate that introduces the event is the verb bite. Some predicates are verbs, but they can be nouns, adjectives, multiword expressions, or constructions as well (Fillmore et al., 2003). In both sentences (13) and (14), there are two roles, Agent and Patient. In both sentences, the Agent of the event, the one who deliberately performs the action, is the dog, and the Patient, the one who undergoes the action, is the mailman, even though the syntactic structure differs between (13) and (14). Semantic roles express “who does what to whom.” Which semantic roles are available depends on the sense of the predicate: In Pass the peas, the direct object can be described as a Patient, but in The train passes a lake, it is a Path. There is no clear-cut rule to decide how many different roles should be distinguished. In the limit, each predicate sense could be said to impose its own set of entailments on its roles (Dowty, 1989). So different semantic role resources (which will be described in Section 2.1) have made different design decisions in the role sets they assume. Semantic role labeling, the task of automatically assigning semantic roles, involves both identifying some semantic class of the predicate and finding and labeling role-bearing constituents (Gildea & Jurafsky, 2002; Täckström et al., 2015). Some semantic role occurrences are implicit; that is, they are not syntactically linked to the predicate and may even be in adjacent sentences. In that case, it is necessary to do inference over neighboring predicates and their arguments (Gerber & Chai, 2010). The aim of unsupervised semantic role labeling (where “unsupervised” means that there is no labeled training data) is to extend semantic role labeling to examples not covered by manually constructed resources (Swier & Stevenson, 2004; Lang & Lapata, 2014).
Semantic roles impose constraints on the properties of their fillers, called selectional preferences: the agent of eat tends to be sentient, the patient of eat tends to be foodstuff. Examples (15) and (16) in Panel (H) illustrate this point: does a barman make a better Agent or Patient of eating? How about an apple? Selectional preferences can be learned from text data by generalizing over observed role fillers (Resnik, 1996; Erk et al., 2010; Ó Séaghdha & Korhonen, 2014). In contrast to the other phenomena discussed so far, selectional preferences are not so much a problem to solve in order to do successful sentence analysis as a set of constraints that can guide processing for other tasks, such as semantic role labeling (Zapirain et al., 2010).
1.2 Phenomena at the Sentence Level
Representing the semantics of a sentence means representing all of its lexical items and the way they are linked into an overall whole. Many different representation formalisms have been developed for this. Figure 1 shows examples of three sentence representation formalisms. On the left is an example of a logic-based representation, in this case a Discourse Representation Structure (Kamp, 1995; Kamp & Reyle, 1993), or DRS for short, for the sentence John the barber does not shave himself. The top part of the box contains the discourse referents mentioned, in this case x. The rest of the box contains the statements made about the discourse referent, in this case that it is named John and that it is a barber. It also contains a sub-box that is negated using the ¬ symbol, stating that that it is not the case that there is a discourse referent e that is a shaving event of which both the agent and the patient are x. This representation can be converted into an equivalent representation in first-order logic.
On the top right is an example of a semantic network. This is a graph where nodes represent concepts and the edges represent relations. These networks can be used for general knowledge representation or for representing sentence meaning (Sowa, 1992). This particular example is a Conceptual Dependency graph (Shank & Tesler, 1969) for the sentence The big man steals the red book from the girl, where different arrow types denote different types of relations.
Sentence representations can also build on and extend semantic role structures, as in the case of the Abstract Meaning Representation (Banarescu et al., 2013), AMR for short, for The boy wants to go on the bottom right of Figure 1. Building on the PropBank semantic role resource (Palmer et al., 2005), the main predicate is assigned PropBank’s semantic class want-01, and is also assigned a discourse referent w. Its arg0, or Agent, is the boy with discourse referent b, and its arg1, or Patient, is the predicate go, assigned its first sense go-01. The arg0 (Agent) of go is b, the discourse referent for the boy: the entity of which the boy wants that it go is the boy himself.
These examples have already shown some of the phenomena that sentence representations have to cover beyond lexical semantics: negation, and co-reference. Table 2 gives further examples of these and additional phenomena. Sentence (17) and (18) are examples of coreference (Poesio et al., 2015). In (17) the pronouns her and she refer to Victoria Chen, and the two noun phrases Megabucks Banking Corp and the company corefer. Event coreference, as in (18), is harder than pronoun and noun coreference, as with event coreference, the exact extent of the event that is being referred to is often unclear.
Table 2: Examples of Some Phenomena at the Sentence Level
A quantifier is an expression like all, some, or no. A quantified noun phrase is an expression like every bacterium and a map in sentence (19) in Table 2. Sentence (19) is ambiguous: It could mean that there is a single map that covers all bacteria, or that the project will create a separate map for each bacterium. This is called a quantifier scope ambiguity, because it can be phrased as the question of which of the two quantifiers has scope over the other. While there are good approaches for the task of compactly representing scope ambiguity (Egg et al., 2001; Copestake et al., 2005), it is still an open question how to predict a hearer’s preferences on quantifier scope.2
Negation also enters into ambiguities with quantifiers, as (20) shows. This sentence is ambiguous between none of the merchandise being evidence, and part of the merchandise not being evidence. The scope of negation—that is, which parts of a sentence are actually being negated—is relevant for many applications, like deciding whether one sentence can be inferred from another (Dagan et al., 2013) or deciding whether a sentiment that is being expressed is positive or negative (Choi & Cardie, 2008), so this task has attracted interest on its own (Morante & Blanco, 2012).
The problem with temporal expressions, as in (21), is that they are often relative and need to be resolved. In (21), the first temporal expression, on Wednesday is relative to the document creation time, and the second temporal expression two days later than expected, is relative to the first. Reasoning over temporal expressions is a difficult task (Verhagen et al., 2007).
Recognizing textual entailment (RTE) (Dagan et al., 2013) means determining, for a given pair consisting of a text T and a hypothesis H, whether, after reading T, a human would conclude that H likely also holds. This is an artificial task, not an application, but it showcases problems arising in many applications, including information extraction and question answering. It is an important task for computational semantics because RTE datasets typically exhibit a wide variety of semantic phenomena. RTE datasets can be tailored to particular phenomena of interest, like lexical and sentence phenomena to the exclusion of world knowledge (Marelli et al., 2014), sentence-level phenomena such as generalized quantifiers (Cooper et al., 1996), or phenomena that are difficult for parsers, such as prepositional phrase attachment (Yuret et al., 2013).
Many applications have approaches that make use of some form of semantic representation, for example question answering, information extraction, sentiment analysis, summarization, reading comprehension, sentence similarity, or machine translation (Jones et al., 2012). Not all approaches address all the phenomena mentioned in this section, and not all approaches need to. More in-depth semantic representations provide opportunities for deeper understanding but also opportunities for additional processing errors.
2. Critical Analysis of Scholarship
This section highlights some of the core strands of research and pervasive questions in the field. The first is the development of manually constructed resources, which has been vastly important in driving the field forward. Computational linguistics relies on the availability of lexical resources like dictionaries and taxonomies, on text corpora labeled with linguistic information, and on other data collections. Section 2.1 discusses some of the most influential of these resources. In addition to using manually constructed resources, researchers in computational linguistics also use the vast amounts of electronic text that are available to extract information about the meanings of words and phrases. Section 2.2 is about the induction of meaning representations from text, and Section 2.3 focuses on sentence meaning representations. There are a number of different ways to represent the overall semantic structure of sentences and stretches of discourse. But one tradition has been particularly influential to computational semantics as a whole, namely logic-based approaches. They will be discussed, as will recent uses and variations of this tradition.
A fourth major theme that is cross-cutting across the other three is machine learning, the use of methods that learn to adapt their response based on the data they see. Computational semantics uses many different forms of machine learning: supervised, where labeled training data is given; unsupervised, where no labeled data is given and regularities are detected in raw data; and semi-supervised, which combines learning from labeled and unlabeled data, or relies on not directly task-appropriate labeled data. It also uses many different forms of data. Sections 2.1, 2.2 and 2.3 discuss the use of machine learning in the context of human-generated resources, corpus extraction of meaning information, and logic-based approaches. Finally, Section 2.4 points out some overarching themes and questions in the field.
2.1 Lexical Resources and Annotated Corpora
The development of manually constructed resources has been one of the main driving forces in the field. These resources specify the linguistic structures to be targeted in automatic analysis, and they provide high-quality human-generated data that can be used to train machine learning systems.
Arguably the most influential resource has been WordNet (Fellbaum, 1998), a lexical database that contains information on nouns, verbs, adjectives, and adverbs. It organizes words into sets of synonyms, called synsets. Synsets are linked through semantic relations; for nouns the main relation is hypernymy. For an individual word, the collection of synsets in which it appears functions as its collection of senses. Figure 2 shows an example excerpt from the WordNet noun hierarchy. The terms lecture, public lecture, talk together form a synset, which is further characterized by a gloss, a speech that is open to the public, and an example use. The hypernym of this synset (as mentioned before, hypernymy is a relation between senses, or lexicalized concepts, not words) is a synset containing address and speech, which has as its hypernym speech act. Several further hypernymy links then lead to the synset event, and finally to the most general synset entity. So WordNet can be used as a dictionary with senses for each word, as a thesaurus that provides synonymy classes, and as taxonomy that allows for generalization over concepts. For example, Resnik (1996) uses the WordNet noun hierarchy to compute selectional preferences, generalizing over observed rolefilling nouns to encompassing categories: if apple, pasta, and cheese have been seen as the direct object (or the patient) of eat, this provides good evidence for the synset food as a characterization of the selectional preference. Similarly, Girju et al. (2006) uses the WordNet noun hierarchy to generalize over observed word pairs while learning part/whole pairs from data.
WordNet can also be used as a graph that defines semantic similarity between lexicalized concepts based on distance in the graph. The simplest WordNet-based similarity measure uses the number of edges traveled from one synset to another as an indicator for the degree of similarity between two synsets; more complex ones draw on corpus data to estimate the “length” of each WordNet graph edge. These approaches are summarized and tested in Pedersen et al. (2004) (who also provide an implementation) and Budanitsky & Hirst (2006).
There are several corpora in which word occurrences have been tagged with their most context-appropriate WordNet senses. These annotated corpora have been used to train supervised classifiers for the task of word sense disambiguation. For supervised classifiers, each training or test instance is turned into a vector of features, and the classifier learns to generalize over the feature vectors of the training instances so that it will be able to label novel test instances. For word sense disambiguation, the feature vector typically contains information about the local context as well as the wider topical context of the target word occurrence, and also its syntactic context (Navigli, 2009).
WordNet with its rich structure also lends itself to various knowledge-based approaches to word sense disambiguation, that is, approaches that exploit information in the lexical database instead of annotated corpus data. A simple knowledge-based approach by Lesk (1986) uses overlap in sense definitions between a word and its context words to decide on the sense to assign. Other knowledge-based approaches make use of the graph structure of WordNet to select closely connected senses for all words in a sentence, in some cases using one of the WordNet-based similarity measures defined above (Sinha & Mihalcea, 2007; Agirre et al., 2014).
The original WordNet is an English resource, but it has been the launching point for resources for other languages as well. EuroWordNet (Vossen, 2004) is a collection of WordNets in multiple European languages, connected through the Inter-Lingual Index, which links equivalent synsets in different languages. BabelNet (Navigli & Ponzetto, 2012) links WordNet to Wikipedia, an online encyclopedia. The resource uses the category structure underlying Wikipedia pages (which is a graph), along with human-generated translations of Wikipedia categories to other languages, to collect multilingual lexical knowledge.
For semantic roles, there are several prominent resources that differ in their design decisions on which and how many semantic roles they distinguish. FrameNet (Fillmore et al., 2003) is primarily a lexicographic project that identifies semantic classes of predicates, called frames, where predicates in the same class evoke the same prototypical situation or state and have the same semantic roles. Semantic roles are specific to each frame. Frames are organized in an inheritance hierarchy, with more specific frames, like Bungling, inheriting from more abstract ones, like Event. The frame inheritance hierarchy includes semantic role inheritance. Figure 3 shows the description of the Bungling frame, which, besides bungle also contains predicates such as the verb bumble and the noun fiasco. The semantic roles are the terms highlighted in color in the figure. FrameNet also provides data annotated with frames and semantic roles, which again enables the construction of supervised classifiers. The original FrameNet resource is for English; FrameNet resources also exist for other languages, including Spanish (Subirats, 2009), Japanese (Ohara, 2012), and Swedish (Borin et al., 2010).
PropBank (Palmer et al., 2005) is another prominent semantic role annotation resource. PropBank adds semantic role annotation to the Penn Treebank (Marcus et al., 1993) while keeping generalizations at a minimum. PropBank roles are verb sense specific, except for Arg0 and Arg1, which uniformly indicate Agent and Patient. Whenever possible, multiple senses of a verb that have the same semantic roles are merged into a single role set (called a frameset). Figure 4 shows an example of a frameset for the verb botch with an annotated example sentence. PropBank has role descriptions exclusively for verbs; NomBank (Meyers et al., 2004) extends the annotation scheme to noun predicates. PropBanks also exist for other languages, including Chinese (Xue & Palmer, 2008) and Arabic (Palmer et al., 2008). While FrameNet groups predicates with the same semantic role structure into frames, PropBank does not generalize across predicates.
There are many systems that do automatic semantic role labeling for PropBank and FrameNet (Màrquez et al., 2008). The first system, by Gildea and Jurafsky (2002), treated semantic role labeling as a combination of three supervised classification tasks, using the manually annotated data provided with FrameNet as training data. The first of the three classifiers assigned a semantic class to the predicate, the second classified constituents for whether they bear roles, and the third assigned role labels to constituents previously classified as role-bearing. A recent system by Das et al. (2014) includes FrameNet frame assignment for targets not seen in the training data, based on seen prototypes. It has role labeling as a classification task, but follows it with a decoding step that makes sure no constituents are labeled by more than one role.
VerbNet (Kipper-Schuler, 2005) builds on the Levin classes, a semantic classification of verbs by their syntactic and sematic behavior (Levin, 1993). VerbNet has verbs grouped into semantic classes, with semantic role listings for each class, and with a formal definition of class properties. Figure 5 has an example, the class remedy-45.7, a very wide class which also includes the verb bungle, along with similar verbs like botch. The formal definition of the class is in the line labeled SEMANTICS. Sem-Link (Palmer et al., 2007) provides links between classes of FrameNet, PropBank, and VerbNet. It also links the semantic predicate classes to OntoNotes senses (Pradhan et al., 2007), groupings of WordNet senses into coarser classes. OntoNotes also provides corpus annotation with these coarse-grained senses, PropBank roles, and coreference information.
For temporal expressions, TimeBank (Pustejovsky et al., 2003) provides corpus data in which events are annotated for their temporal relations. For whole-sentence semantic representations, the Groningen MeaninBank (Bos et al., 2017) offers annotation with Discourse Representation Structures, and the AMR bank (Banarescu et al., 2013) provides text annotated with Abstract Meaning Representations.
This is only a small selection of the resources that exist. There are consortia that store and provide linguistic resources, the Linguistic Data Consortium (LDC) for America and the European Language Resources Association (ELRA) for Europe, which have many resources that are relevant to semantics. There is also a workshop series that has greatly advanced both the production and use of semantic resources. It was originally called SensEval and was conceived as an effort to provide common sets of training and test data to make word sense disambiguation systems comparable (Kilgarriff, 1998). Now called SemEval, the workshop has grown to providing 10–20 different datasets and tasks each year. For example, in 2015 it comprised tasks on paraphrasing and sentence similarity, time and space analysis, sentiment, word sense disambiguation and induction, and semantic relation learning. Datasets typically remain available after the end of the competitions and thus provide important resources and points of comparison for later research.
2.2 Inducing Lexical Information From Data
A second core strand of research in computational semantics has been learning lexical knowledge from text data. Today large amounts of text are available electronically, on the web and through specially collected corpus resources. By processing this text and accumulating the clues that it offers, we can derive approximate and noisy representations of lexical meaning. Learning lexical knowledge from text is useful because the lexicon is so large and keeps growing and changing, especially when we also take multi-word expressions into account. Also, domains like scientific texts or manuals have their own specialized vocabulary, which may not be reflected well in manually constructed general-purpose resources. Another argument in favor of inducing lexical information from corpora is that human-generated and automatically compiled lexical knowledge are often complementary to some extent, with each of them providing some information that the other misses.
2.2.1 Pattern-Based Approaches
Text contains information about relations between words, for example hypernymy. This information can be given explicitly, as in Cats are animals. But as Hearst (1992) pointed out, this information may also be given implicitly, as in the following example from Hearst (1992):
From this sentence we can infer with certainty that the bambara ndang is a bow lute. Besides such as, Hearst identified a number of further patterns that indicate hypernymy. Here are a few of them:
Such patterns, called Hearst patterns, are widely used to detect hypernymy and other semantic relations. However, while Hearst patterns yield high-precision information for hypernymy, they are noisier for other relations. Girju et al. (2006) use Hearst patterns to induce part-whole relations, but the patterns are not nearly as reliable as for hypernymy. For example the wheels of the car indicates a part-whole relation, but the owner of the car does not.
In her seminal paper, Hearst also proposed a technique for finding more patterns, namely to search for word sequences linking known hyponym/hypernym pairs. This technique has been extended into a bootstrapping cycle that infers new patterns for a relation from known pairs of words that stand in that relation, and then new word pairs from the new patterns (Riloff & Jones, 1999). This has been used primarily for fine-grained semantic relations needed for information extraction (Etzioni et al., 2004; Pantel & Pennachiotti, 2006).
2.2.2 Distributional Approaches
Distributional approaches build on the idea that words with similar meanings tend to appear in similar contexts. They represent a word through the contexts in which it has been observed to appear (Turney & Pantel, 2010; Erk, 2012). The semantic similarity of two words can then be estimated based on the similarity of their observed contexts. Originating in the vector space models of information retrieval (Salton et al., 1975), distributional models were proposed in computational linguistics as a way to model human word learning and similarity ratings (Landauer & Dumais, 1997). Figure 6 illustrates the idea of distributional models on a toy dataset. The target word sofa (underlined) is observed in four sentences. As the context of sofa we count all words in a window of three words on either side of the target (italicized). The resulting table of counts (bottom left) can be interpreted as a point in a high-dimensional space, with one dimension per context word. The graph on the bottom right of the figure illustrates this for just two dimensions, sleep and fry: The word sofa has a value of 2 on the dimension sleep, and 1 on fry. There are several ways to compare sofa to, say, kitchen, which in the figure we have assumed to have a count of 1 on sleep and a higher count on fry. One popular method is to compute the cosine of the angle between the two vectors, which will be 1 if they point in the same direction and lower otherwise. Distributional models and their similarity predictions have been used in many creative ways: to predict word similarity (Landauer & Dumais, 1997), to automatically construct thesauri (Lin, 1998), to find synonyms (Landauer & Dumais, 1997), to characterize selectional preferences (Erk et al., 2010), to predict paraphrase goodness of fit (Erk & Padó, 2008), and to model analogy (Turney, 2006). Schütze (1998) uses distributional models to induce word senses from data: he represents each individual occurrence of a target word as the sum of the distributional representations of all words in the sentence, then clusters the occurrence vectors into senses. Lin and Pantel (2001) realized that “phrases with holes,” such as X solves Y or X finds a solution to Y, can be characterized by their contexts (in their case, the seen fillers for X and Y ) in the same way as words, and they use this insight to learn inference rules as phrases-with-holes that appear in similar contexts: if X solves Y applies to a pair (X, Y ), then it is likely that X finds a solution to Y also does. Mitchell and Lapata (2010) raise the question of whether word representations can be composed into phrase representations in order to predict similarity of phrases. This question will be taken up again in Section 2.3, which discusses compositionality.
The model in Figure 6 is a count-based distributional model. There are two other prominent approaches to computing distributional models. Like the vector space models in information retrieval, topic models (Blei et al., 2003; Steyvers & Griffiths, 2007) represent documents distributionally. They are Bayesian models that describe a document as probabilistically generated from a number of topics, each of which probabilistically generates words. Like count-based models, they can also be used to model words, and like count-based models they have been used to predict paraphrase goodness of fit (Dinu & Lapata, 2010) and to model selectional preferences (Ó Séaghdha & Korhonen, 2014).
The third prominent approach is prediction-based models. They use deep learning models, neural network models that can consist of more than three layers (hence deep). These models can be used in a supervised machine learning to make predictions on some task, typically language modeling, the task of predicting how a word sequence will be continued (Collobert & Weston, 2008; Mikolov et al., 2013; Pennington et al., 2014). Deep neural networks can learn an internal representation of the data that best fits the task, and if the task is predicting the context of a word, then the internal representation of a word, called its embedding, will characterize the contexts in which it tends to appear, so it functions as a distributional model. Prediction-based models provide good performance on many tasks (Baroni et al., 2014), though the exact relation of count-based and prediction-based models, and the techniques that can be used with them, are still being explored (Levy & Goldberg, 2014; Levy et al., 2015). What is perhaps the biggest advantage of prediction-based models is that through a change in their objective function, they can be made to learn additional constraints. They can be tailored to tasks like sentiment analysis or logical inference (Socher et al., 2012), or an existing set of embeddings can be used as the basis for an additional task-specific training. This technique is widespread, with examples including Wieting et al. (2015) for paraphrasing, Kruszewski and Baroni (2015) for concept compatibility, Bowman et al. (2015a) and Rocktäschel et al. (2015a) for textual entailment, and Kumar et al. (2016) for question answering. Word embeddings that have been widely reused include the model of Collobert and Weston (2008), the GloVe model of Pennington et al. (2014), and the word2vec model of Mikolov et al. (2013). The word2vec model has a particularly simple structure, and variants of this model have been proposed for a number of different tasks, including distributional phrase representations (Pham et al., 2015) and paraphrase appropriateness in context (Melamud et al., 2015).
2.2.3 Multimodal Data
When text co-occurs with images, this combination can be used for lexical learning as well. For example, in a corpus of images with captions, many of the words in the captions can be expected to refer to things depicted in the accompanying images. So a joint textual and visual representation can be learned from both the words surrounding a target word in a caption and the image that the caption labels. Such multimodal distributional models can be used simply as more expressive meaning representations (Bruni et al., 2012), or for image annotation and text illustration (Feng & Lapata, 2010). To learn multimodal distributional models, the images are decomposed either into lowlevel image snippets (Feng & Lapata, 2010; Bruni et al., 2012) or into high-level visual attributes (Silberer et al., 2013). Then textual and visual features are combined to produce distributional models. Young et al. (2014) use images with multiple captions, along with simplifications of those captions, to define what they call visual denotations of terms, which they then use to compute similarity between terms based on the overlap between images that they describe.
Multimodal data is not only used to compute distributional models. By learning to associate elements of a visual or otherwise grounded representation of a scene or an event, it is possible to automatically generate text that describes what is happening in the grounded representation (Farhadi et al., 2010; Chen & Mooney, 2008; Venugopalan et al., 2015).
2.2.4 Cross-Lingual Approaches
It is often hard to decide how to group the occurrences of a word into senses. Resnik & Yarowsky (2000) propose that the senses to be distinguished should be the ones that receive different translations into other languages. Bannard & Callison-Burch (2005) introduced the converse of this idea: that words or phrases that have the same translation are likely to be paraphrases. The corpora needed to make this inference are parallel corpora that have the exact same sentences in multiple languages, such as the proceedings of the European parlament (Koehn, 2005). Ganitkevitch et al. (2013) extend the idea of Bannard and Callison-Burch into the ParaPhrase DataBase (PPDB), a large resource of automatically generated paraphrases for words and phrases. Cross-lingual approaches are often combined with manually created resources or other ways of inducing lexical knowledge, so additional cross-lingual approaches are discussed in the next paragraph.
2.2.5 Combinations of Approaches
Of course the different approaches to inducing semantic knowledge can be, and have been, combined with each other and with manually created resources; only a few examples are given here for the many ways in which techniques can be blended. In the discussion of multimodal models, the possibility of combining textual distributional models with image information has already been mentioned. Distributional evidence can also be combined with pattern-based information for better relation inference (Pennacchiotti & Pantel, 2009), and with cross-lingual evidence for better paraphrase scoring: Ganitkevitch et al. (2013) consider terms as paraphrase candidates if they have the same translations, but then use distributional similarity to rerank these candidates and provide confidence scores for them.
Corpus induction methods can be used to extend existing manually created resources. The approach of Snow et al. (2006) is a nice example that combines lexical resource with both pattern-based and distributional approaches: Snow et al. use Hearst patterns to predict hypernymy and distributional evidence to predict “cousinhood” (two concepts having he same parent, grandparent, etc. in a taxonomy), and they reason over the structure of the WordNet hierarchy along with these two different types of evidence to decide whether and where a new term should be inserted into WordNet. Also, parallel corpora can be used to project semantic information from one language to another. For example, Pado and Lapata (2005) manually annotate data in a source language with semantic roles in the FrameNet paradigm, then project this information to the target language, in order to facilitate the development of a Frame-semantic resource for the target language.
Conversely, WordNet and other manually constructed resources can be used to inform the learning of distributional information. For example, Boyd-Graber et al. (2007) build a topic model that uses WordNet synsets as a hidden variable, that is, information that is not given in the data but that the model must infer, in order to improve word sense disambiguation. Faruqui et al. (2015) improve word embeddings by constraining them to conform to WordNet, PPDB, and FrameNet to improve performance on word similarity tasks.
2.2.6 “Accidental” Resources
All the approaches in this section make use of data that was generated not as a resource for computational semantics but for other purposes. Text on the web and in news corpora is created for communication between humans, and the same holds for images and videos accompanied by text. Parallel corpora arise from translations for use by humans. There are more of these “accidental” resources that were created for other purposes but constitute good sources of data for computational semantics, a prominent one being Wikipedia (Medelyan et al., 2009); Wikipedia can be used as a concept graph based on both its concept hierarchy and on the hyperlinks in the articles (Strube & Ponzetto, 2006), and the Wikipedia edit history can be mined for typing errors and their corrections, which can then be used to train a spelling correction system. Other examples include images and their human-generated captions on Flickr, or movie transcripts. As the field moves towards more “data-hungry” learning methods, in particular deep learning, the discovery of large “accidental” resources becomes more important. An example of a recent such resource is the text comprehension dataset of Hermann et al. (2015), who make use of the fact that some newspapers provide short summaries of their articles. Then large amounts of training and test data for text comprehension can be generated automatically by withholding one of the proper names in the summary, where the task then is to guess which of the proper names in the main article it is.
2.3 Compositional Semantics and Logical Form
The third core theme in computational semantics is formal, logic-based representations of sentence meaning, and compositionality—the systematic construction of meaning representations for larger expressions from the representations of their parts.
2.3.1 Logic-Based Sentence Representations
The aim of logic-based approaches in computational semantics is to translate sentences into a formal representation that is well suited for inferences and other automatic processing.
Representing the meaning of a sentence through logic has a long tradition both in linguistic semantics (Montague, 1970; Dowty et al., 1981; Kamp & Reyle, 1993) and in computational semantics (Blackburn & Bos, 2005; van Eijck & Unger, 2010). (24) shows an example of a sentence and its representation in first-order logic. It reads: For any (∀) thing, call it x, if it is a mouse, then (→) for all things y, if they are a cats, then there is (exists, ∃) an e that is a fearing event and (∧) x is the agent and (∧ again) y is the patient of the event. All content words in the sentence are represented as predicates in the logic, such as mouse′, while structural parts of the sentence are represented through quantifiers (∃, ∀) and operators (∧, →). This representation that we have chosen is neo-Davidsonian (Davidson, 1967; Parsons, 1990), it has events as entities with their own variables, in our case e. Another choice that this particular representation makes is not to do anything about the word sense ambiguity. For example, the sentence contains the word mouse, which can mean a small animal or an electronic device. We could also choose to represent ambiguity, for example by attaching the WordNet sense number to the predicate name, which in this case would make it mouse′#1.
When meaning is represented through logic, it is natural to formulate natural language understanding tasks as inference tasks to be performed by a theorem prover, a system that performs automated reasoning over formal representations to check whether a given argumentation is valid (Blackburn & Bos, 2005; Bos, 2011). For example, Bos and Markert (2005) apply a logic-based system to the Textual Entailment task by asking whether the Text logically entails the Hypothesis.
(24) uses first-order logic. This is a common choice, but not the only one. Discourse representation structures, as in Figure 1, have advantages during semantics construction, the assembly of a formal representation, which is discussed in Section 2.3.2. If scope ambiguity needs to be represented compactly, there are several choices of so-called underspecified logical representations, for example Egg et al. (2001) and Copestake et al. (2005). When the main concern is efficient inference, it often makes sense to use a logic that is less expressive but is decidable, such as Decision Logic (Baader, 2003). There are also specialized languages for interfacing with particular databases or robots (Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005; Liang et al., 2011).
The idea of natural logic is to perform inference directly over natural language, rather than a formal representation. An idea with a long history, it was recently cast into a formal inference system by MacCartney and Manning (2009). They propose a system of relations between pairs of terms that characterize set relations between the corresponding categories. For example, they write “crow ⊏ bird” to say that all entities that are crows are also in the set of birds, and “cat | dog” to say that there is no entity that is both a cat and dog, though there may be entities that are neither. Bowman et al. (2015b) use deep learning to assign natural logic relations to term pairs, and Pavlick et al. (2015) integrate natural logic into the PPDB paraphrase database.
2.3.2 Semantics Construction and Compositionality
Semantics construction is an important problem for sentence representations. It is the question of how to construct representations for all the infinitely many sentences that a hearer could encounter. The core idea for how to do this is the principle of compositionality, which posits that the meaning of a phrase is determined by the meanings of its components and the relations between them. If that is so, meaning representations can be constructed by a bottom-up traversal of a syntactic parse tree, where representations for lexical items are listed in the lexicon, and representations for larger expressions are constructed based on the representations of their parts. This section looks at semantics construction in more detail for logic-based representations like the one in (24). Figure 7 shows a very simple example, in which the representation for John, as listed in the lexicon, is john′. This is passed up from the lexical node to its noun phrase (NP) parent. The meaning representation of walks, walk′, is likewise passed up from the lexical node to its verb phrase (VP) parent. The semantics construction rule for the sentence (S) node then says to apply the representation of the VP to the representation of the NP, yielding walk′ (john′).
But in general, the lexical and intermediate representations need to be more complex than that. Compare the two sentences (24) and (25) (All mice fear all cats and All mice are fast):
This shows that the representation of all mice needs to be something like (26): For any x, if x is a mouse, then there something else that is being said of them, to be filled in.
Such “logical representations with holes” are usually constructed using typed lambda calculus, as shown in (27). The hole now has a label, P, and the λP in the beginning states this fact.
2.3.3 Interfacing With Formal Representations
Logic-based and logic-like meaning representations are particularly suited for interfacing with systems that internally already use a logic-based or logic-like representation, such as robots or databases. The idea is that the user gives a command to the robot, which is automatically translated to logic, and because the robot internally uses logic, it can then directly follow the command. Likewise, a user could state a natural language question to a database, which is automatically translated to the logic-like database query language and executed, and the result returned to the user. The task of deriving formal meaning representations that interface with a robot, a database, or another logic-based system is called semantic parsing (Zelle & Mooney, 1996). The earliest datasets for the task involved a geography database and an advice language for simulated soccer robots. Because these datasets pair natural language commands with their formal translations, they allow for semantic parsing to be phrased as a machine learning problem: learning to map natural language sentences to formal representations by observing correct mappings of this kind in the training data (Zettlemoyer & Collins, 2005; Ge & Mooney, 2006; Kate & Mooney, 2007; Wong & Mooney, 2007). Semantic parsing has expanded to include more tasks, such as parsing queries to a flight information database (Zettlemoyer & Collins, 2007), queries to the Free-Base database (Berant et al., 2013), or program specifications (Lei et al., 2013). This introduced new challenges such as ungrammatical natural language utterances or mapping to a gigantic ontology. Semantic parsing then took on yet another challenge with tasks where there is no training data pairing natural language sentences with formal representations. The only information available is feedback on whether a task was executed successfully, so the learner has to infer the correctness of its formal representations from the success of its actions (Branavan et al., 2012; Artzi & Zettlemoyer, 2013; Kwiatkowski et al., 2013).
2.3.4 Compositionality and Distributional Models
Distributional models have been very successful in modeling word similarity, as described in Section 2.2. Mitchell and Lapata (2010) raised the very natural question of whether distributional models of phrases can be equally successful in modeling phrase similarity. More specifically, they asked how distributional phrase representations could be constructed compositionally from representations of their components, drawing on the notion of compositionality from semantics construction. Mitchell and Lapata (2010) championed a simple approach, component-wise multiplication of vectors, which works particularly well for short phrases. Baroni and Zamparelli (2010) and Coecke et al. (2010) both propose systems based on tensors (higher-order generalizations of vectors), with different tensors representing different types of phrases, and Grefenstette and Sadrzadeh (2011) and Grefenstette et al. (2013) work this out further in practice. There are also deep learning based approaches that learn representations that are suitable to phrase composition (Socher et al., 2011; Pham et al., 2015). Compositional distributional models have been used to study a variety of linguistic phenomena, such as noun modification (Boleda et al., 2013; Kruszewski & Baroni, 2014), semantic anomaly (Vecchi et al., 2011), and morpheme meaning (Marelli & Baroni, 2015).
2.3.5 Logic and Weights
There have been many approaches that combine logic and weights. A seminal paper was Hobbs et al. (1993), which considered the interpretation of natural language sentences as abductive reasoning, that is, reasoning “backwards” from observations to possible underlying explanations. They express the “cost” of an inference as a weight on the corresponding inference rule, and use these weighted rules to find the simplest possible interpretation of a natural language sentences as its cheapest explanation. Other approaches do weighted deductive reasoning. In this framework, a system is given a collection of entities, along with some predicates that are true of them, and probabilistically explores which other predicates may be true of each entity. In this, it is guided by weighted logical rules that state likely inferences, where the weights may be learned from a knowledge base, based on what fraction of the instances in the knowledge base obey a rule. This general idea can be implemented using statistical relational learning (Getoor & Taskar, 2007), neural networks (Towell et al., 1990; Eliassi-Rad, 2001), or programming languages for generative models (Goodman et al., 2008). Conversely, distributed representations (that is, representations that model a word or phrase through a large vector of features, for example distributional models) can be constrained to (partially) obey logical rules. For example, vectors that represent words in a way that aids named entity tagging can be constrained by rules like “organization names don’t come right before person names” (Rocktäschel et al., 2015b; Hu et al., 2016). Especially statistical relational learning has been used in computational semantics and its applications, for paraphrase induction (Poon & Domingos, 2009), event extraction (Riedel et al., 2009), textual entailment (Beltagy et al., 2013), and question answering (Khot et al., 2015).
2.4 Questions and Debates
The field of computational semantics mainly advances through short, focused technical contributions. But there are some larger questions that come up again and again and that are being debated, step by small step, within these technical contributions. This section will highlight some of them.
General versus task-specific representations. What is better, general or task-specific representations? General representations, like WordNet, or distributional representations like LSA (Landauer & Dumais, 1997) or word2vec (Mikolov et al., 2013), can be reused across tasks. On the other hand, task-specific representations can be more focused and achieve superior performance. For word sense disambiguation, Kilgarriff (1997) argues that there are no non-task-specific senses.
The mystery of joint prediction. There are some pairs of tasks that should naturally benefit each other, where it seems natural that doing them together should lead to advances on both, like dependency parsing and semantic role labeling (Hajič et al., 2009). Yet progress with joint models remains very difficult to achieve.
Supervised and unsupervised learning. Should we rely solely on human-generated resources (dictionaries and taxonomies, rules, and annotated data) as a basis for computational semantics? After all, supervised approaches, which build on a higher quality, less noisy basis, almost always beat unsupervised approaches. Or should we solely use unsupervised methods? After all, lexical phenomena are hard to characterize in a rule-based fashion, and unsupervised approaches can provide much larger volumes of data for data-hungry tasks like textual entailment or question answering (or basically any task that needs access to paraphrases). Or is the best approach to combine human insight with wide-coverage corpus-based approaches, to combine the precision of human-provided information with the breadth of corpus-based approaches? This is a choice that reappears across phenomena and across tasks, and is being addressed with every technical contribution.
Learning and linguistic analysis. A related but more fundamental question is whether there is a role for in-depth linguistic guidance during learning at all. This is an old question (Hirschberg, 1998), but a question that is being discussed anew with the advent of deep learning models (Manning, 2015). Deep learning models can learn task-appropriate feature representations, and they can be designed to be modular with different components that are being trained together, which makes them powerful learners. They have been used to develop end-to-end systems for complex applications, for example for machine translation (Cho et al., 2014), textual entailment (Bowman et al., 2015a), and text comprehension (Hermann et al., 2015). Some of these systems are trained with raw text input, without guidance by linguistic analysis, while others use linguistic analysis in new ways to guide learning. The one thing that seems certain is that new insights into this question of the relation of learning and linguistic analysis can be expected soon.
This section provides pointers for further reading. First, a very general pointer to a resource for computational linguistics: the Association for Computational Linguistics (ACL) provides all papers from its main conferences, journals, and workshops for free on a single site, the ACL anthology.
This is an excellent, searchable resource, and ideally suited for getting an overview of recent work in the field as well as for diving more deeply into a sub-area.
We continue with a few pointers for each of the themes discussed above. This article started out with a discussion of phenomena and prominent analysis tasks. The survey articles by Navigli (2009) and McCarthy (2009) provide a good starting point for Word Sense Disambiguation. Das et al. (2014) is a recent paper on semantic role labeling that exemplifies the individual parts of the problem and possible solutions. For multiword expressions, the best starting points are the handbook article of Baldwin and Kim (2010) and the famous “pain in the neck” paper of Sag et al. (2002). For coreference resolution, Poesio et al. (2015) shows the history of the task from its origins to recent algorithms. Dagan et al. (2013) discuss Textual Entailment, a “meta-task” that with its challenging mixture of phenomena has driven forward the field.
The first of the three core strands of research discussed here was semantic resources. The WordNet book (Fellbaum, 1998) provides a great overview of the electronic dictionary that has been so influential to the field. For semantic roles, Fillmore et al. (2003) and Palmer et al. (2005) are good introductions to two of the main resources. There are also documents at the International Organization for Standardization (ISO) about annotation formats for several semantic phenomena, which can be found by searching for “SemAF.” At this point, there are documents about semantic annotation principles, time and events, semantic roles, discourse structure, and dialog acts.
The second core theme was unsupervised approaches. Hearst (1992) is the seminal paper for pattern-based methods for extracting semantic relations. Turney and Pantel (2010) is a survey article on count-based distributional models, Blei et al. (2003) is the seminal paper on Bayesian topic modeling, and Mikolov et al. (2013) describes a word embedding approach that is widely used as is, and has also been the starting point for many later models. Bannard and Callison-Burch (2005) is the paper that first proposed using parallel data for the induction of paraphrases.
For logic-based approaches, the in-depth overview article of Bos (2011) provides the best possible starting point.
In addition, Jurafsky and Martin (2009) is a textbook that covers many of the core techniques of semantic processing, and thus is a good starting point for experiments.
Agirre, E., Lacalle, d. O. L., and Soroa, A. (2014). Random walks for knowledge-based word sense disambiguation. Computational Linguistics, 40(1), 57–84.Find this resource:
Artzi, Y., & Zettlemoyer, L. (2013). Weakly supervised learning of semantic parsers for mapping instructions to actions. Transaction of the Association for Computational Linguistics (TACL), 1, 49–62.Find this resource:
Baader, F. (2003). The description logic handbook: Theory, implementation, and applications. Cambridge, UK: Cambridge University Press.Find this resource:
Baldwin, T., & Kim, S. N. (2010). Multiword expressions. In N. Indurkhya & F. Damerau (Eds.), Handbook of natural language processing (2d ed., pp. 267–292). Boca Raton, FL: CRC Press.Find this resource:
Banarescu, L., Bonial, C., Cai, S., Georgescu, M., Griffitt, K., Hermjakob, U., . . . Schneider, N. (2013). Abstract Meaning Representation for sembanking. In Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse (pp. 178–186). New York: Association for Computational Linguistics.Find this resource:
Bannard, C., & Callison-Burch, C. (2005). Paraphrasing with bilingual parallel corpora. In Proceedings of the 43rd annual meeting of the Association for Computational Linguistics (ACL’05) (pp. 597–604). New Brunswick, NJ: Association for Computational Linguistics.Find this resource:
Baroni, M., Dinu, G., & Kruszewski, G. (2014). Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd annual meeting of the Association for Computational Linguistics (pp. 238–247). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Baroni, M., & Zamparelli, R. (2010). Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (pp. 1183–1193). Cambridge, MA: Association for Computational Linguistics.Find this resource:
Beltagy, I., Chau, C., Boleda, G., Garrette, D., Erk, K., & Mooney, R. (2013). Montague meets Markov: Deep semantics with probabilistic logical form. In Proceedings of the second joint Conference on Lexical and Computational Semantics (Star SEM 2013) (pp. 1–21). Red Hook, NY: Curran.Find this resource:
Berant, J., Alon, N., Dagan, I., & Goldberger, J. (2015). Efficient global learning of entailment graphs. Journal of Computational Linguistics, 41(2), 221–264.Find this resource:
Berant, J., Chou, A., Frostig, R., & Liang, P. (2013). Semantic parsing on freebase from question-answer pairs. In Proceedings of the 2013 conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1533–1544). Seattle: Association for Computational Linguistics.Find this resource:
Blackburn, P., & Bos, J. (2005). Representation and inference for natural language: A first course in computational semantics. Stanford: Center for the Study of Language and Information.Find this resource:
Blei, D., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3(4–5), 993–1022.Find this resource:
Boleda, G., Baroni, M., Pham, T. N., & McNally, L. (2013). Intensionality was only alleged: On adjective-noun composition in distributional semantics. In Proceedings of the 10th international Conference on Computational Semantics (IWCS 2013)—Long Papers (pp. 35–46). Stroudsburg, PA: Association for Computational Linguistics Potsdam, Germany.Find this resource:
Borin, L., Dannlls, D., Forsberg, M., Toporowska Gronostaj, M., & Kokkinakis, D. (2010). The past meets the present in Swedish FrameNet++. In L. Ahrenberg (Ed.), Swedish Language Technology Conference 2010 (pp. 269–281). Stockholm: Språkrådet.Find this resource:
Bos, J. (2011). A survey of computational semantics: Representation, inference and knowledge in wide-coverage text understanding. Language and Linguistics Compass, 5(6), 336–366.Find this resource:
Bos, J., Basile, V., Evang, K., Venhuizen, N., & Bjerva, J. (2017). The Groningen meaning bank. In N. Ide & J. Pustejovsky (Eds.), Handbook of linguistic annotation (pp. 463–496). Berlin: Springer.Find this resource:
Bos, J., & Markert, K. (2005). Recognising textual entailment with logical inference. In J. Quiñonero-Candela, I. Dagan, B. Magnini, & F. D’Alché-Buc (Eds.), Machine learning challenges: Evaluating predictive uncertainty, visual object classification, and recognising textual entailment (pp. 404–426). Berlin: Springer.Find this resource:
Bowman, S. R., Angeli, G., Potts, C., & Manning, C. D. (2015a). A large annotated corpus for learning natural language inference. In Proceedings of the 2015 conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 632–642). Red Hook, NY: Curran.Find this resource:
Bowman, S. R., Potts, C., & Manning, C. D. (2015b). Learning distributed word representations for natural logic reasoning. In Knowledge representation and reasoning: Integrating symbolic and neural approaches: Papers from the 2015 AAAI Spring Symposium (pp. 10–13). Palo Alto, CA: AAAI Press.Find this resource:
Boyd-Graber, J., Blei, D., & Zhu, X. (2007). A topic model for word sense disambiguation. In Proceedings of the 2007 conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1024–1033). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Branavan, S. R. K., Silver, D., & Barzilay, R. (2012). Learning to win by reading manuals in a Monte-Carlo framework. Journal of Artificial Intelligence Research, 43(1), 661–704.Find this resource:
Brown, S. W. (2008). Choosing sense distinctions for WSD: Psycholinguistic evidence. In Proceedings of the 46th annual meeting of the Association for Computational Linguistics: Human language technologies; June 15–20, 2008, The Ohio State University, Columbus, Ohio, USA (pp. 249–252). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Bruni, E., Boleda, G., Baroni, M., & Tran, N. K. (2012). Distributional semantics in Technicolor. In Proceedings of the 50th annual meeting of the Association for Computational Linguistics (Volume 1: Long papers) (pp. 136–145). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Budanitsky, A., & Hirst, G. (2006). Evaluating WordNet-based measures of lexical semantic relatedness. Computational Linguistics, 32(1), 13–47.Find this resource:
Chen, D. L., & Mooney, R. J. (2008). Learning to sportscast: A test of grounded language acquisition. In William Cohen (Ed.), Proceedings of the 25th International Conference on Machine Learning (ICML) (pp. 128–135). New York: ACM.Find this resource:
Cho, K., van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN Encoder-Decoder for statistical machine translation. In Proceedings of the 2014 conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1724–1734). Red Hook, NY: Curran.Find this resource:
Choi, Y., & Cardie, C. (2008). Learning with compositional semantics as structural inference for subsentential sentiment analysis. In Proceedings of the 2008 conference on Empirical Methods in Natural Language Processing (pp. 793–801). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Choueka, Y. (1988). Looking for needles in a haystack or locating interesting collocational expressions in large textual databases. In Proceedings of the 2nd international Conference on Computer-Assisted Information Retrieval (RIA’88) (pp. 609–624). Paris: CID.Find this resource:
Coecke, B., Sadrzadeh, M., & Clark, S. (2010). Mathematical foundations for a compositional distributed model of meaning. Linguistic Analysis, 36, 345–384.Find this resource:
Collobert, R., & Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. In A. K. McCallum & S. Roweis (Eds.), ICML 2008: Proceedings of the twenty-fifth International Conference on Machine Learning (pp. 160–167). New York: ACM.Find this resource:
Cooper, R., Crouch, D., Eijck, J. V., Fox, C., Genabith, J. V., Jaspars, J., . . . Konrad, K. (1996). Using the framework. Technical Report LRE 62-051 D-16, The FraCaS Consortium.Find this resource:
Copestake, A., Flickinger, D., Sag, I., & Pollard, C. (2005). Minimal recursion semantics: An introduction. Research on Language and Computation, 3(2–3), 281–332.Find this resource:
Cruse, D. A. (1995). Polysemy and related phenomena from a cognitive linguistic viewpoint. In P. Saint-Dizier & E. Viegas (Eds.), Computational lexical semantics: Studies in natural language processing (pp. 33–49). Cambridge, UK: Cambridge University Press.Find this resource:
Dagan, I., Roth, D., Sammons, M., & Zanzotto, F. M. (2013). Recognizing textual entailment: Models and applications. San Rafael, CA: Morgan & Claypool.Find this resource:
Das, D., Chen, D., Martins, A. F. T., Schneider, N., & Smith, N. A. (2014). Frame-semantic parsing. Computational Linguistics, 40(1), 9–56.Find this resource:
Davidson, D. (1967). The logical form of action sentences. In N. Rescher (Ed.), The logic of decision and action (pp. 81–95). Pittsburgh: University of Pittsburgh Press.Find this resource:
Dinu, G., & Lapata, M. (2010). Measuring distributional similarity in context. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1162–1172). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Dowty, D. (1989). On the semantic content of the notion of thematic role. In G. Cherchia, B. Partee, & R. Turner (Eds.), Properties, types and meaning (pp. 69–129). Dordrecht: Kluwer.Find this resource:
Dowty, D. R., Wall, R. E., & Peters, S. (1981). Introduction to Montague semantics. Dordrecht: D. Reidel.Find this resource:
Edmonds, P., & Hirst, G. (2002). Near-synonymy and lexical choice. Computational Linguistics, 28(2), 105–144.Find this resource:
Egg, M., Koller, A., & Niehren, J. (2001). The constraint language for lambda structures. Journal of Logic, Language, and Information, 10(4), 457–485.Find this resource:
Eliassi-Rad, T. (2001). Building intelligent agents that learn to retrieve and extract information. (Unpublished doctoral dissertation). University of Wisconsin, Madison.Find this resource:
Erk, K. (2012). Vector space models of word meaning and phrase meaning: A survey. Language and Linguistics Compass, 6(10), 635–653.Find this resource:
Erk, K., & Padó, S. (2008). A structured vector space model for word meaning in context. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (pp. 897–906). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Erk, K., Padó, S., & Padó, U. (2010). A flexible, corpus-driven model of regular and inverse selectional preferences. Computational Linguistics, 36(4), 723–763.Find this resource:
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A., Shaked, T., . . . Yates, A. (2004). Web-scale information extraction in KnowItAll. In Proceedings of the 13th international conference on World Wide Web (pp. 100–110). New York: ACM.Find this resource:
Evert, S. (2005). The statistics of word cooccurrences: Word pairs and collocations. (Unpublished doctoral dissertation). Stuttgart University.Find this resource:
Farhadi, A., Hejrati, M., Sadeghi, M., Young, P., Rashtchian, C., Hockenmaier, J., & Forsyth, D. (2010). Every picture tells a story: Generating sentences from images. In K. Daniilidis, P. Maragos, & N. Paragios (Eds.), 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5–11, 2010, Proceedings, Part IV (pp. 15–29). Berlin: Springer.Find this resource:
Faruqui, M., Dodge, J., Jauhar, S. K., Dyer, C., Hovy, E., & Smith, N. A. (2015). Retrofitting word vectors to semantic lexicons. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (pp. 1606–1615). Red Hook, NY: Curran.Find this resource:
Fellbaum, C. (Ed.). (1998). WordNet: An electronic lexical database. Cambridge, MA: MIT Press.Find this resource:
Feng, Y., & Lapata, M. (2010). Topic models for image annotation and text illustration. In Human language technologies: The 2010 annual conference of the North American chapter of the Association for Computational Linguistics (pp. 831–839). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Fillmore, C. (1968). The case for case. In E. Bach & R. Harms (Eds.), Universals in linguistic theory (pp. 21–119). New York: Holt, Rinehart and Winston.Find this resource:
Fillmore, C. J., Johnson, C. R., & Petruck, M. R. (2003). Background to FrameNet. International Journal of Lexicography, 16(3), 235–250.Find this resource:
Frantzi, K., Ananiadou, S., & Tsujii, J. (1998). The C-value/NC-value method of automatic recognition for multi-word terms. In Proceedings of the second European Conference on Research and Advanced Technology for Digital Libraries (ECDL) (pp. 585–604). Berlin: Springer.Find this resource:
Ganitkevitch, J., VanDurme, B., & Callison-Burch, C. (2013). PPDB: The paraphrase database. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics (pp. 758–764). Red Hook, NY: Curran.Find this resource:
Ge, R.. & Mooney, R. J. (2006). Discriminative reranking for semantic parsing. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL-06) (pp. 263–270). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Gerber, M., & Chai, J. (2010). Beyond NomBank: A study of implicit arguments for nominal predicates. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (pp. 1583–1592). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Getoor, L., & Taskar, B. (2007). Introduction to statistical relational learning. Cambridge, MA: MIT Press.Find this resource:
Gildea, D., & Jurafsky, D. (2002). Automatic labeling of semantic roles. Computational Linguistics, 28(3), 245–288.Find this resource:
Girju, R., Badulescu, A., & Moldovan, D. (2006). Automatic discovery of part-whole relations. Computational Linguistics, 32(1), 83–135.Find this resource:
Goodman, N., Mansinghka, V., Roy, D., Bonawitz, K., & Tenenbaum, J. B. (2008). Church: A language for generative models. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (pp. 220–229). Corvallis, OR: AUAI Press.Find this resource:
Grefenstette, E., Dinu, G., Zhang, Y., Sadrzadeh, M., & Baroni, M. (2013). Multistep regression learning for compositional distributional semantics. In H. Bunt (Ed.), Proceedings of the 10th International Conference on Computational Semantics (IWCS 2013)—Long papers (pp. 131–142). Red Hook, NY: Curran.Find this resource:
Grefenstette, E., & Sadrzadeh, M. (2011). Experimental support for a categorical compositional distributional model of meaning. In Proceedings of the 2011 conference on Empirical Methods in Natural Language Processing (pp. 1394–1404). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Hajič, J., Ciaramita, M., Johansson, R., Kawahara, D., Martí, M. A., Màrquez, L., . . . Zhang, Y. (2009). The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of the thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared task (pp. 1–18). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Hearst, M. A. (1992). Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 1992 Conference on Computational Linguistics (pp. 539–545). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Hermann, K. M., Kocisky, T., Grefenstette, E., Espeholt, L., Kay, W., Suleyman, M., & Blunsom, P. (2015). Teaching machines to read and comprehend. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, & R. Garnett (Eds.), Advances in neural information processing systems: 28th annual Conference on Neural Information Processing Systems 2015 (pp. 1684–1692). Red Hook, NY: Curran.Find this resource:
Hirschberg, J. (1998). “Every time I fire a linguist, my performance goes up,” and other myths of the statistical natural language processing revolution. Invited talk, Fifteenth National Conference on Artificial Intelligence (AAAI-98).Find this resource:
Hobbs, J. R., Stickel, M. E., Appelt, D. E., & Martin, P. (1993). Interpretation as abduction. Artificial Intelligence, 63(1), 69–142.Find this resource:
Hu, Z., Ma, X., Liu, Z., Hovy, E., & Xing, E. (2016). Harnessing deep neural networks with logic rules. In 54th annual meeting of the Association for Computational Linguistics (ACL) (pp. 2410–2420). Red Hook, NY: Curran.Find this resource:
Jackendoff, R. (1997). The architecture of the language faculty. Linguistic Inquiry Monographs 28. Cambridge, MA: MIT Press.Find this resource:
Jones, B., Andreas, J., Bauer, D., Hermann, K. M., & Knight, K. (2012). Semantics-based machine translation with hyperedge replacement grammars. In Proceedings of COLING 2012 (pp. 1359–1376). Bombay: Indian Institute of Technology.Find this resource:
Jurafsky, D., & Martin, J. (2009). Speech and language processing: An introduction to natural language processing, speech recognition, and computational linguistics (2d ed.). Upper Saddle River, NJ: Prentice-Hall.Find this resource:
Kamp, H. (1995). Discourse representation theory. In J. Verschueren, J.-O. Östman, & J. Blommaert (Eds.), Handbook of pragmatics (pp. 253–257). Amsterdam: John Benjamins.Find this resource:
Kamp, H., & Reyle, U. (1993). From discourse to logic. Dordrecht: Kluwer.Find this resource:
Kate, R. J., & Mooney, R. J. (2007). Semi-supervised learning for semantic parsing using support vector machines. In Proceedings of the Human Language Technology Conference of the North American chapter of the Association for Computational Linguistics, Short papers (NAACL/HLT-2007) (pp. 81–84). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Khot, T., Balasubramanian, N., Gribkoff, E., Sabharwal, A., Clark, P., & Etzioni, O. (2015). Exploring Markov Logic Networks for question answering. In Proceedings of the conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 685–694). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Kilgarriff, A. (1997). “I don’t believe in word senses.” Computers and the Humanities, 31, 91–113.Find this resource:
Kilgarriff, A. (1998). Gold standard datasets for evaluating word sense disambiguation programs. Computer Speech and Language, 12(3), 453–472.Find this resource:
Kipper-Schuler, K. (2005). VerbNet: A broad-coverage, comprehensive verb lexicon. (Unpublished doctoral dissertation). University of Pennsylvania, Philadelphia.Find this resource:
Koehn, P. (2005). Europarl: A parallel corpus for statistical machine translation. In The tenth Machine Translation Summit: September 13–15, 2005, proceedings of conference MT Summit X, Phuket Thailand (pp. 79–86). Tokyo: Asia-Pacific Association for Machine Translation.Find this resource:
Kremer, G., Erk, K., Pado, S., & Thater, S. (2014). What substitutes tell us: Analysis of an “all-words” lexical substitution corpus. In Proceedings of the 14th conference of the European Chapter of the Association for Computational Linguistics (pp. 540–549). Gothenburg: Association for Computational Linguistics.Find this resource:
Kruszewski, G., & Baroni, M. (2014). Dead parrots make bad pets: Exploring modifier effects in noun phrases. In J. Bos, A. Frank, & R. Navigli (Eds.), Proceedings of the third joint conference on Lexical and Computational Semantics (*SEM 2014) (pp. 171–181). Dublin: Association for Computational Linguistics.Find this resource:
Kruszewski, G., & Baroni, M. (2015). So similar and yet incompatible: Toward automated identification of semantically compatible words. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (pp. 964–969). Stroudsburg, PA: Association for Computational Linguistics. Proceedings of NAACL.Find this resource:
Kumar, A., Irsoy, O., Ondruska, P., Iyyer, M., Bradbury, J., Gulrajani, I., . . . Socher, R. (2016). Ask me anything: Dynamic memory networks for natural language processing. In Proceedings of the 33rd International Conference on Machine Learning (ICML) (pp. 1378–1387). New York: ACM.Find this resource:
Kwiatkowski, T., Choi, E., Artzi, Y., & Zettlemoyer, L. (2013). Scaling semantic parsers with on-the-fly ontology matching. In Proceedings of the 2013 conference on Empirical Methods in Natural Language Processing (pp. 1545–1556). Seattle: Association for Computational Linguistics.Find this resource:
Lakoff, G., & Johnson, M. (1980). Metaphors we live by. Chicago: University of Chicago Press.Find this resource:
Landauer, T., & Dumais, S. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211–240.Find this resource:
Lang, J., & Lapata, M. (2014). Similarity-driven semantic role induction via graph partitioning. Computational linguistics, 40(3).Find this resource:
Lapata, M., & Lascarides, A. (2003). A probabilistic account of logical metonymy. Computational Linguistics, 29(2), 263–317.Find this resource:
Lee, H., Recasens, M., Chang, A., Surdeanu, M., & Jurafsky, D. (2012). Joint entity and event coreference resolution across documents. In Proceedings of the 2012 joint conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 489–500). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Lei, T., Long, F., Barzilay, R., & Rinard, M. (2013). From natural language specifications to program input parsers. In Proceedings of the 51st annual meeting of the Association for Computational Linguistics (Volume 1: Long papers) (pp. 1294–1303). Red Hook, NY: Curran.Find this resource:
Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the 5th annual international Conference on Systems Documentation (pp. 24–26). New York: ACM.Find this resource:
Levin, B. (1993). English verb classes and alternations: A preliminary investigation. Chicago: University of Chicago Press.Find this resource:
Levy, O., & Goldberg, Y. (2014). Neural word embeddings as implicit matrix factorization. In Advances in neural information processing systems: 27th annual Conference on Neural Information Processing Systems (pp. 2177–2185). Red Hook, NY: Curran.Find this resource:
Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving distributional similarity with lessons learned from word embeddings. Transactions of the Association for Computational Linguistics, 3(1), 211–225.Find this resource:
Liang, P., Jordan, M. I., & Klein, D. (2011). Learning dependency-based compositional semantics. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL) (pp. 590–599). Stroudsburg, PA: Association for Computational Linguistics. Association for Computational Linguistics (ACL).Find this resource:
Lin, D. (1998). Automatic retrieval and clustering of similar words. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING ACL) (pp. 768–774). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Lin, D., & Pantel, P. (2001). Discovery of inference rules for question answering. Natural Language Engineering, 7(4), 343–360.Find this resource:
MacCartney, B. & Manning, C. D. (2009). An extended model of natural logic. In Proceedings of the 8th international Conference on Computational Semantics (IWCS 2009). Publisher: Association for computational linguistics, Stroudsburg, PA. Proceedings of IWCS (pp. 140–156).Find this resource:
Manning, C. D. (2015). Computational linguistics and deep learning. Computational Linguistics, 41(4), 701–707.Find this resource:
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.Find this resource:
Marelli, M., & Baroni, M. (2015). Affixation in semantic space: Modeling morpheme meanings with compositional distributional semantics. Psychological Review, 122(3), 485–515.Find this resource:
Marelli, M., Menini, S., Baroni, M., Bentivogli, L., Bernardi, R., & Zamparelli, R. (2014). A SICK cure for the evaluation of compositional distributional semantic models. In Proceedings of the 9th conference on Language Resources and Evaluation (LREC). Paris: European Language Resources Association. LREC 2014.Find this resource:
Màrquez, L., Carreras, X., Litkowski, K., & Stevenson, S. (2008). Semantic role labeling: An introduction to the special issue. Computational Linguistics, 34(2), 145–159.Find this resource:
McCarthy, D. (2009). Word sense disambiguation: An overview. Language and Linguistics Compass, 3(2), 537–558.Find this resource:
McCarthy, D., Koeling, R., Weeds, J., & Carroll, J. (2004). Finding predominant word senses in untagged text. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL’04) (pp. 279–286). Barcelona: Association for Computational Linguistics.Find this resource:
McCarthy, D., & Navigli, R. (2009). The English lexical substitution task. Language Resources and Evaluation, 43(2), 139–159.Find this resource:
Medelyan, O., Milne, D., Legg, C., & Witten, I. (2009). Mining meaning from Wikipedia. International Journal on Human-Computer Studies, 67, 716–754.Find this resource:
Melamud, O., Levy, O., & Dagan, I. (2015). A simple word embedding model for lexical substitution. In Workshop on vector space modeling for NLP. Proceedings of the workhop on vector space modeling for NLP at the 2015 Conference of the North American Chapter of the Association for Computational Linguistics—Human Language Technologies (NAACL HLT 2015) (pp. 1–7). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., & Grishman, R. (2004). Annotating noun argument structure for NomBank. In Proceedings of the 4th International Conference on Language Resources and Evaluation (LREC). Paris: European Language Resources Association. Lisbon, Portugal.Find this resource:
Mikolov, T., Yih, W.-t., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Proceedings of NAACL-HLT. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) (pp. 746–751). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Mitchell, J., & Lapata, M. (2010). Composition in distributional models of semantics. Cognitive Science, 34(8), 1388–1429.Find this resource:
Montague, R. (1970). Universal grammar. Theoria, 36, 373–398.Find this resource:
Morante, R., & Blanco, E. (2012). *SEM 2012 shared task: Resolving the scope and focus of negation. In SEM 2012: The first joint Conference on Lexical and Computational Semantics (SemEval 2012) (pp. 265–274). Red Hook, NY: Curran.Find this resource:
Nakov, P., & Hearst, M. (2013). Semantic interpretation of noun compounds using verbal and other paraphrases. ACM Transactions on Speech and Language Processing (TSLP), 10(3).Find this resource:
Navigli, R. (2009). Word sense disambiguation: A survey. ACM Computing Surveys, 41(2), 1–69.Find this resource:
Navigli, R., & Ponzetto, S. P. (2012). BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, 217–250.Find this resource:
Ohara, K. (2012). Semantic annotations in Japanese FrameNet: Comparing frames in Japanese and English. In Proceedings of LREC. Proceedings of the eighth international conference on Language Resources and Evaluation (LREC). Paris: European Language Resources Association.Find this resource:
Ó Séaghdha, D., & Korhonen, A. (2014). Probabilistic distributional semantics with latent variable models. Computational Linguistics, 40(3), 587–631.Find this resource:
Pado, S., & Lapata, M. (2005). Cross-lingual projection of role-semantic information. In Proceedings of HLT/EMNLP 2005, Vancouver, BC. In Proceedings of the Human Language Technology Conference/Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP). Red Hook, NY: Curran.Find this resource:
Palmer, M., Babko-Malaya, O., Bies, A., Diab, M., Maamouri, M., Mansouri, A., & Zaghouani, W. (2008). A pilot Arabic PropBank. In Proceedings of LREC, Marrakesh, Morocco. In Proceedings of the sixth international conference on Language Resources and Evaluation, LREC 2008. Paris: European Language Resources Association.Find this resource:
Palmer, M., Gildea, D., & Kingsbury, P. (2005). The Proposition Bank: A corpus annotated with semantic roles. Computational Linguistics, 31(1).Find this resource:
Palmer, M. S., Dang, H. T., & Fellbaum, C. (2007). Making fine-grained and coarse-grained sense distinctions, both manually and automatically. Natural Language Engineering, 13(2).Find this resource:
Pantel, P., & Pennachiotti, M. (2006). Espresso: Leveraging generic patterns for automatically harvesting semantic relations. In Proceedings of ACL. In Proceedings of the joint conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics (Coling-ACL) (pp. 113–120). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Parsons, T. (1990). Events in the semantics of English. Cambridge, MA: MIT Press.Find this resource:
Pavlick, E., Rastogi, P., Ganitkevich, J., & Ben Van Durme, C. C.-B. (2015). PPDB 2.0: Better paraphrase ranking, fine-grained entailment relations, word embeddings, and style classification. In Proceedings of the 53rd annual meeting of the Association for Computational Linguistics (ACL), Beijing, China (pp. 425–430). Red Hook, NY: Curran.Find this resource:
Pedersen, T., Patwardhan, S., & Michelizzi, J. (2004). WordNet::Similarity: Measuring the relatedness of concepts. In Proceedings of the fifth annual meeting of the North American chapter of the Association for Computational Linguistics (NAACL) (pp. 38–41). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Pennacchiotti, M., & Pantel, P. (2009). Entity extraction via ensemble semantics. In Proceedings of the 2009 conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 238–247). Morristown, NJ: Association for Computational Linguistics.Find this resource:
Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1532–1543). Morristown, NJ: Association for Computational Linguistics.Find this resource:
Pham, N., Kruszewski, G., Lazaridou, A., & Baroni, M. (2015). Jointly optimizing word representations for lexical and sentential tasks with the C-PHRASE model. In Proceedings of ACL. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2015) (pp. 971–981). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Poesio, M., Stuckardt, R., & Versley, Y. (2015). Anaphora resolution: Algorithms, resources, and applications. Berlin: Springer.Find this resource:
Poon, H., & Domingos, P. (2009). Unsupervised semantic parsing. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1–10). Morristown, NJ: Association for Computational Linguistics.Find this resource:
Pradhan, S., Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., & Weischedel, R. (2007). OntoNotes: A unified relational semantic representation. In Proceedings of the First IEEE International Conference on Semantic Computing (ICSC) (pp. 405–419). Los Alamitos, CA: IEEE Computer Society.Find this resource:
Pustejovsky, J. (1995). The generative lexicon. Cambridge, MA: MIT Press.Find this resource:
Pustejovsky, J., Hanks, P., Saurí, R., See, A., Gaizauskas, R., Setzer, A., . . . Lazo, M. (2003). The TIMEBANK corpus. In Proceedings of corpus linguistics (pp. 647–656). In Proceedings of the Corpus Linguistics 2003 conference. Dawn Archer, Paul Rayson, Andrew Wilson and Tony McEnery (eds.). UCREL technical paper number 16. UCREL, Lancaster University.Find this resource:
Ramisch, C., Villavicencio, A., & Kordoni, V. (2013). Introduction to the special issue on multiword expressions: From theory to practice and use. ACM Transactions on Speech and Language Processing (TSLP), 10(2).Find this resource:
Resnik, P. (1996). Selectional constraints: An information-theoretic model and its computational realization. Cognition, 61, 127–159.Find this resource:
Resnik, P., & Yarowsky, D. (2000). Distinguishing systems and distinguishing senses: New evaluation methods for word sense disambiguation. Natural Language Engineering, 5(2), 113–133.Find this resource:
Riedel, S., Chun, H.-W., Takagi, T., & Tsujii, J. (2009). A Markov Logic approach to bio-molecular event extraction. In BioNLP 2009. In Proceedings of the BioNLP workshop co-located with the North American Chapter of the Association for Computational Linguistics—Human Language Technologies (NAACL HLT) 2009 (pp. 41–49). Morristown, NJ: Association for Computational Linguistics.Find this resource:
Riloff, E., & Jones, R. (1999). Learning dictionaries for information extraction by multi-level bootstrapping. In Proceedings of AAAI/IAAI (pp. 474–479). In Proceedings of the 16th National Conference on Artificial Intelligence & 11th Innovative Applications of Artificial Intelligence Conference. Palo Alto, CA: Association for the Advancement of Artificial Intelligence.Find this resource:
Rocktäschel, T., Singh, S., & Riedel, S. (2015b). Injecting logical background knowledge into embeddings for relation extraction. In NAACL. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics—Human Language Technologies (NAACL HLT 2015) (pp. 1119–1129). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Sag, I., Baldwin, T., Bond, F., Copestake, A., & Flickinger, D. (2002). Multiword expressions: A pain in the neck for NLP. In Proceedings of Computational Linguistics and Intelligent Text Processing: Third international conference: CICLing-2002, volume 2276 of Lecture Notes in Computer Science (pp. 1–15). Berlin: Springer.Find this resource:
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Communications of the ACM, 18(11), 613–620.Find this resource:
Schütze, H. (1998). Automatic word sense discrimination. Computational Linguistics, 24, 97–123.Find this resource:
Shank, R., & Tesler, L. (1969). A conceptual dependency parser for natural language. In Conference on Computational Linguistics, Sang-Saby, Sweden. In Proceedings of the International conference on Computational Linguistics (COLING). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Shutova, E. (2013). Metaphor identification as interpretation. In Proceedings of *SEM, Atlanta, GA. In Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*SEM) (pp. 276–285). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Shutova, E. (2015). Design and evaluation of metaphor processing systems. Computational Linguistics, 41(4), 579–623.Find this resource:
Silberer, C., Ferrari, V., & Lapata, M. (2013). Models of semantic representation with visual attributes. In Proceedings of the 51th annual meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August (pp. 91–99). Red Hook, NY: Curran.Find this resource:
Sinha, R., & Mihalcea, R. (2007). Unsupervised graph-based word sense disambiguation using measures of word semantic similarity. In Proceedings of the IEEE International Conference on Semantic Computing (ICSC), Irvine, CA (pp. 363–369). Los Alamitos, CA: IEEE Computer Society.Find this resource:
Snow, R., Jurafsky, D., & Ng, A. Y. (2006). Semantic taxonomy induction from heterogenous evidence. In 21st international conference on Computational Linguistics and 44th annual meeting of the Association for Computational Linguistics (COLING ACL), Sydney, Australia (pp. 801–808). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Socher, R., Huang, E. H., Pennin, J., Ng, A. Y., & Manning, C. D. (2011). Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. In J. Shawe-Taylor, R. Zemel, P. Bartlett, F. Pereira, & K. Weinberger (Eds.), Advances in neural information processing systems 24 (pp. 801–809). La Jolla, CA: Neural Information Processing Systems.Find this resource:
Socher, R., Huval, B., Manning, C. D., & Ng, A. Y. (2012). Semantic compositionality through recursive matrix-vector spaces. In Conference on Empirical Methods in Natural Language Processing (EMNLP). In Proceedings of the 2012 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 1201–1211). Morristown, NJ: Association for Computational Linguistics.Find this resource:
Sowa, J. (1992). Semantic networks. In Encyclopedia of artificial intelligence. New York: Wiley.Find this resource:
Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. In T. Landauer, D. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of latent semantic analysis, pp. 427-448. Mahwah, NJ: Lawrence Erlbaum Associates.Find this resource:
Strube, M., & Ponzetto, S. P. (2006). WikiRelate! Computing semantic relatedness using Wikipedia. In Proceedings of the twenty-first national conference on Artificial Intelligence (AAAI 06), eighteenth Innovative Applications Artificial Intelligence (IAAI 06) (pp. 1419–1424), Menlo Park, CA: AAAI Press.Find this resource:
Subirats, C. (2009). Spanish FrameNet: A frame-semantic analysis of the Spanish lexicon. In H. C. Boas (Ed.), Multilingual FrameNets in computational lexicography: Methods and applications (pp. 135–162). New York: Mouton de Gruyter.Find this resource:
Swier, R. S., & Stevenson, S. (2004). Unsupervised semantic role labelling. In D. Lin & D. Wu (Eds.), Proceedings of EMNLP ). In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 95–102). Morristown, NJ: Association for Computational Linguistics.Find this resource:
Szarvas, G., Biemann, C., & Gurevych, I. (2013). Supervised all-words lexical substitution using delexicalized features. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) (pp. 1131–1141). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Szpektor, I., Dagan, I., Bar-Haim, R., & Goldberger, J. (2008). Contextual preferences. In Proceedings of ACL. In Proceedings of the Annual Meeting of the Association for Computational Linguistics and Human Language Technology Conference (ACL-HLT) (pp. 683–691). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Täckström, O., Ganchev, K., & Das, D. (2015). Efficient inference and structured learning for semantic role labeling. Transactions of the Association for Computational Linguistics, 3, 29–41.Find this resource:
Towell, G. G., Shavlik, J. W., & Noordewier, M. O. (1990). Refinement of approximate domain theories by knowledge-based neural networks. In Proceedings of the Eighth National Conference on Artificial Intelligence (AAAI) (pp. 861–866), Portland, OR: AAAI PressFind this resource:
Turney, P. (2006). Similarity of semantic relations. Computational Linguistics, 32(3), 379–416.Find this resource:
Turney, P., & Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37, 141–188.Find this resource:
van Eijck, J., & Unger, C. (2010). Computational semantics with functional programming. Cambridge, UK: Cambridge University Press.Find this resource:
Veale, T., Shutova, E., & Klebanov, B. B. (2016). Metaphor: A computational perspective. San Rafael, CA: Morgan & Claypool.Find this resource:
Vecchi, E. M., Baroni, M., & Zamparelli, R. (2011). (Linear) maps of the impossible: Capturing semantic anomalies in distributional space. In Proceedings of the Workshop on Distributional Semantics and Compositionality at the Annual conference of the Association for Computational Linguistics (pp. 1–9). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Venugopalan, S., Xu, H., Donahue, J., Rohrbach, M., Mooney, R., & Saenko, K. (2015). Translating videos to natural language using deep recurrent neural networks. In Proceedings of the 2015 conference of the North American chapter of the Association for Computational Linguistics—Human Language Technologies (NAACL HLT 2015) (pp. 1494–1504). Red Hook, NY: Curran.Find this resource:
Verhagen, M., Gaizauskas, R., Schilder, F., Hepple, M., Katz, G., & Pustejovsky, J. (2007). SemEval-2007 task 15: TempEval temporal relation identification. In Proceedings of the fourth international Workshop on Semantic Evaluations (SemEval-2007) (pp. 75–80). Prague: Association for Computational Linguistics.Find this resource:
Vossen, P. (Ed.). (2004). EuroWordNet: A multilingual database with lexical semantic networks for European languages. Dordrecht: Kluwer.Find this resource:
Weaver, W. (1949). Translation. In W. Locke & A. Booth (Eds.), Machine translation of languages: Fourteen essays (pp. 15–23). Cambridge, MA: MIT Press.Find this resource:
Wieting, J., Bansal, M., Gimpel, K., & Livescu, K. (2015). From paraphrase database to compositional paraphrase model and back. Transactions of the Association for Computational Linguistics, 3, 345–358.Find this resource:
Wong, Y. W., & Mooney, R. J. (2007). Learning synchronous grammars for semantic parsing with lambda calculus. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL-2007) (pp. 960–967). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Xue, N., & Palmer, M. (2008). Adding semantic roles to the Chinese Treebank. Natural Language Engineering, 15(1), 143–172.Find this resource:
Young, P., Lai, A., Hodosh, M., & Hockenmaier, J. (2014). From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. In ACL 2014. Transactions of the Association for Computational Linguistics (Vol. 2, pp. 67–78).Find this resource:
Yuret, D., Rimell, L., & Han, A. (2013). Parser evaluation using textual entailments. Language Resources and Evaluation, 47(3), 639–659.Find this resource:
Zapirain, B. n., Agirre, E., Màrquez, L., & Surdeanu, M. (2010). Improving semantic role classification with selectional preferences. In Human language technologies: The 2010 annual conference of the North American Chapter of the Association for Computational Linguistics (pp. 373–376). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
Zelle, J. M., & Mooney, R. J. (1996). Learning to parse database queries using inductive logic programming. In Proceedings of the thirteenth national Conference on Artificial Intelligence (pp. 1050–1055). Portland, OR: AAAI Press.Find this resource:
Zettlemoyer, L., & Collins, M. (2005). Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In Proceedings of the conference on Uncertainty in Artificial Intelligence (UAI) (pp. 658–666). Arlington, VA: AUAI Press.Find this resource:
Zettlemoyer, L., & Collins, M. (2007). Online learning of relaxed CCG grammars for parsing to logical form. In Proceedings of the 2007 joint conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL) (pp. 678–687). Stroudsburg, PA: Association for Computational Linguistics.Find this resource:
(2.) Sentences (19) and (20), originally collected from the internet using simple search patterns, were used for a class assignment on deciding quantifier scope. Each of the sentences got about 50% of the votes for the first quantifier outscoping the other, and 50% for the first quantifier being outscoped.