Papers
- The Winograd schema challenge - paper establishing the Winograd schema by by Hector J. Levesque and co.
- Understanding natural language - seminal paper by Terry Winograd
- The Defeat of the Winograd Schema Challenge - By 2019, a number of AI systems, based on large pre-trained transformer-based language models and fine-tuned on these kinds of problems, achieved better than 90% accuracy. In this paper, we review the history of the Winograd Schema Challenge and discuss the lasting contributions of the flurry of research that has taken place on the WSC in the last decade.
- WinoGrande An Adversarial Winograd Schema Challenge at Scale - WinoGrande is a new collection of 44k problems, inspired by the Winograd Schema Challenge, but adjusted to improve the scale and robustness against the dataset-specific bias. Formulated as a fill-in-a-blank task with binary options, the goal is to choose the right option for a given sentence which requires commonsense reasoning.
Overview
The Winograd schema challenge (WSC) is a test of machine intelligence proposed in 2012 designed to be an improvement on the Turing test. Tests are binary/multiple choice instances of Winograd schemas, after Terry Winograd.
On the surface, Winograd schema questions simply require the resolution of anaphora: the machine must identify the antecedent of an ambiguous pronoun in a statement. Levesque argues that Winograd schemas require knowledge and commonsense reasoning.
The challenge is considered defeated in 2019 since a number of transformer-based language models achieved accuracies of over 90%. See The Defeat of the Winograd Schema Challenge.
Winograd schemas
The key factor in the WSC is the special format of its questions, which are derived from Winograd schemas. Questions of this form may be tailored to require knowledge and commonsense reasoning in a variety of domains. They must also be carefully written not to betray their answers by selection or statistical information about the words in the sentence.
Origin of Winograd Schemas
The first cited example of a Winograd schema (and the reason for their name) is due to Terry Winograd:
The city councilmen refused the demonstrators a permit because they [feared/advocated] violence.
The choices of âfearedâ and âadvocatedâ turn the schema into its two instances:
- The city councilmen refused the demonstrators a permit because they feared violence.
- The city councilmen refused the demonstrators a permit because they advocated violence.
The schema challenge question is, âDoes the pronoun âtheyâ refer to the city councilmen or the demonstrators?â Switching between the two instances of the schema changes the answer. The answer is immediate for a human reader, but proves difficult to emulate in machines.
Levesque argues that knowledge plays a central role in these problems: the answer to this schema has to do with our understanding of the typical relationships between and behavior of councilmen and demonstrators.
Ernest Davis has compiled a list of over 140 Winograd schemas from various sources as examples of the kinds of questions that should appear on the Winograd schema challenge: The Winograd Schema Challenge - Ernest Davis, Leora Morgenstern, and Charles Ortiz
Formal description
A Winograd schema challenge question consists of three parts:
- A sentence or brief discourse that contains the following:
- Two noun phrases of the same semantic class (male, female, inanimate, or group of objects or people),
- An ambiguous pronoun that may refer to either of the above noun phrases, and
- A special word and alternate word, such that if the special word is replaced with the alternate word, the natural resolution of the pronoun changes.
- A question asking the identity of the ambiguous pronoun, and
- Two answer choices corresponding to the noun phrases in question.
A machine will be given the problem in a standardized form which includes the answer choices, thus making it a binary decision problem.
Advantages
- Knowledge and commonsense reasoning are required to solve them.
- Winograd schemas of varying difficulty may be designed, involving anything from simple cause-and-effect relationships to complex narratives of events.
- They may be constructed to test reasoning ability in specific domains (e.g., social/psychological or spatial reasoning).
- There is no need for human judges.
Pitfalls
Development of the questions is challenging. Need to be carefully tailored to ensure that they require commonsense reasoning to solve.
For example, The Winograd schema challenge gives the following example of a so-called Winograd schema that is âtoo easyâ:
The women stopped taking pills because they were [pregnant/carcinogenic]. Which individuals were [pregnant/carcinogenic]?
The answer to this question can be determined on the basis of selectional restrictions: in any situation, pills do not get pregnant, women do; women cannot be carcinogenic, but pills can. Thus this answer could be derived without the use of reasoning, or any understanding of the sentencesâ meaningâall that is necessary is data on the selectional restrictions of pregnant and carcinogenic.