Hearers and readers make inferences on the basis of what they hear or read. These inferences are partly determined by the linguistic form that the writer or speaker chooses to give to her utterance. The inferences can be about the state of the world that the speaker or writer wants the hearer or reader to conclude are pertinent, or they can be about the attitude of the speaker or writer vis-à-vis this state of affairs. The attention here goes to the inferences of the first type. Research in semantics and pragmatics has isolated a number of linguistic phenomena that make specific contributions to the process of inference. Broadly, entailments of asserted material, presuppositions (e.g., factive constructions), and invited inferences (especially scalar implicatures) can be distinguished.
While we make these inferences all the time, they have been studied piecemeal only in theoretical linguistics. When attempts are made to build natural language understanding systems, the need for a more systematic and wholesale approach to the problem is felt. Some of the approaches developed in Natural Language Processing are based on linguistic insights, whereas others use methods that do not require (full) semantic analysis.
In this article, I give an overview of the main linguistic issues and of a variety of computational approaches, especially those stimulated by the RTE challenges first proposed in 2004.
In the linguistic literature, the term theme has several interpretations, one of which relates to discourse analysis and two others to sentence structure. In a more general (or global) sense, one may speak about the theme or topic (or topics) of a text (or discourse), that is, to analyze relations going beyond the sentence boundary and try to identify some characteristic subject(s) for the text (discourse) as a whole. This analysis is mostly a matter of the domain of information retrieval and only partially takes into account linguistically based considerations. The main linguistically based usage of the term theme concerns relations within the sentence. Theme is understood to be one of the (syntactico-) semantic relations and is used as the label of one of the arguments of the verb; the whole network of these relations is called thematic relations or roles (or, in the terminology of Chomskyan generative theory, theta roles and theta grids). Alternatively, from the point of view of the communicative function of the language reflected in the information structure of the sentence, the theme (or topic) of a sentence is distinguished from the rest of it (rheme, or focus, as the case may be) and attention is paid to the semantic consequences of the dichotomy (especially in relation to presuppositions and negation) and its realization (morphological, syntactic, prosodic) in the surface shape of the sentence. In some approaches to morphosyntactic analysis the term theme is also used referring to the part of the word to which inflections are added, especially composed of the root and an added vowel.