Title: The distributional hypothesis
Authors: Magnus Sahlgren
Published: 2008-01-01
Link: https://linguistica.sns.it/RdL/20.1/Sahlgren.pdf

Abstract

Distributional approaches to meaning acquisition utilize distributional properties of linguistic entities as the building blocks of semantics. In doing so, they rely fundamentally on a set of assumptions about the nature of language and meaning referred to as “the distributional hypothesis”. The main point of this hypothesis is that there is a correlation between distributional similarity and meaning similarity, which allows us to utilize the former in order to estimate the latter. However, it is neither clear what kind of distributional properties we should look for, nor in what sense it is meaning that is conveyed by distributional patterns.

This paper examines these two questions, and shows that distributional approaches to meaning acquisition are rooted, and thrive, in structuralist soil. Recognizing this fact enables us to see both the potentials and the boundaries of distributional models, and above all, it provides a clear and concise answer to the above-posed questions: a distributional model accumulated from co-occurrence information contains syntagmatic relations between words, while a distributional model accumulated from information about shared neighbors contains paradigmatic relations between words.

The paper discusses the structuralist origins of the distributional methodology, and distinguishes the two main types of distributional models - the syntagmatic and the paradigmatic types. It also takes a summary look at how these models are implemented, and discusses their main parameters from a linguistic point of view. The paper argues that - under the assumptions made by the distributional paradigm - the distributional representations do constitute full-blown accounts of linguistic meaning.