🪴 Anil's Garden

❯

Information-Theoretic Probing for Linguistic Structure

18 Jul 20253 min read

paper
interpretability
annotated

Title: Information-Theoretic Probing for Linguistic Structure
Authors: Tiago Pimentel, Josef Valvoda, Rowan Hall Maudslay, Ran Zmigrod, Adina Williams, Ryan Cotterell
Published: 7th April 2020 (Tuesday) @ 01:06:36
Link: http://arxiv.org/abs/2004.03061v2

Abstract

The success of neural networks on a diverse set of NLP tasks has led researchers to question how much these networks actually “know” about natural language. Probes are a natural way of assessing this. When probing, a researcher chooses a linguistic task and trains a supervised model to predict annotations in that linguistic task from the network’s learned representations. If the probe does well, the researcher may conclude that the representations encode knowledge related to the task. A commonly held belief is that using simpler models as probes is better; the logic is that simpler models will identify linguistic structure, but not learn the task itself. We propose an information-theoretic operationalization of probing as estimating mutual information that contradicts this received wisdom: one should always select the highest performing probe one can, even if it is more complex, since it will result in a tighter estimate, and thus reveal more of the linguistic information inherent in the representation. The experimental portion of our paper focuses on empirically estimating the mutual information between a linguistic property and BERT, comparing these estimates to several baselines. We evaluate on a set of ten typologically diverse languages often underrepresented in NLP research---plus English---totalling eleven languages.

We assert that the natural operationalization of probing is estimating the mutual information (Cover and Thomas, 2012) between a representation-valued random variable and a linguistic property–valued random variable. This operationalization gives probing a clean, information-theoretic foundation, and allows us to consider what “probing” actually means.

Our analysis also provides insight into how to choose a probe family: We show that choosing the highest-performing probe, independent of its complexity, is optimal for achieving the best estimate of mutual information (MI). This contradicts the received wisdom that one should always select simple probes over more complex ones (Alain and Bengio, 2017; Liu et al., 2019; Hewitt and Manning, 2019).

Lots of famous names in those papers they just cited… 😬

In this context, we also discuss the recent work of Hewitt and Liang (2019) who proposes selectivity as a criterion for choosing families of probes. Hewitt and Liang (2019) defines selectivity as the performance difference between a probe on the target task and a control task, writing “[t]he selectivity of a probe puts linguistic task accuracy in context with the probe’s capacity to memorize from word types.” They further ponder: “when a probe achieves high accuracy on a linguistic task using a representation, can we conclude that the representation encodes linguistic structure, or has the probe just learned the task?”

Information-theoretically, there is no difference between learning the task and probing for linguistic structure, as we will show; thus, it follows that one should always employ the best possible probe for the task without resorting to artificial constraints.

(the above sentence comes right after the preceding one in the text, but I separated it for emphasis.)

Working on a typologically diverse set of languages (Basque, Czech, English, Finnish, Indonesian, Korean, Marathi, Tamil, Telugu, Turkish and Urdu), we show that only in five of these eleven languages do we recover higher estimates of mutual information between part-ofspeech tags and BERT (Devlin et al., 2019), a common contextualized embedder, than from a control.

Languages they test on:

Basque
Czech
English
Finnish
Indonesian
Korean
Marathi
Tamil
Telugu
Turkish
Urdu

Graph View

Backlinks

Language Models
eXplainability

Website
Bluesky
Twitter/X
GitHub
LinkedIn
Instagram
Goodreads
Letterboxd
🍋

🪴 Anil's Garden

Explorer

Information-Theoretic Probing for Linguistic Structure

Graph View

Backlinks