Language models assessment through linguistically motivated contrasts

A benchmark for Italian (BLiMP-IT)

Authors

DOI:

https://doi.org/10.11576/glow-1242

Keywords:

Language model evaluation, Minimal pairs, Morphosyntax, Poverty of stimulus

Abstract

We present BLiMP-IT, a linguistically-informed benchmark to assess the performance of Italian Language Models (LMs). Inspired by state-of-the-art tools for LM evaluation and informed both by generative theorizing and psycholinguistic metrics, this benchmark tests a rich variety of structures using minimal pair contrasts, i.e., a grammatical sentence and an ungrammatical one minimally differing with respect to a single morphosyntactic property. Prompting the model to assign a probability value to the sentences within each pair, BLiMP-IT tests LMs accuracy, as well as their ability to reach linguistically meaningful generalizations, ultimately offering insights on human-machine comparability and the validity of the Poverty of Stimulus hypothesis.

Downloads

Published

2026-04-23

How to Cite

Bressan, V. (2026) “Language models assessment through linguistically motivated contrasts: A benchmark for Italian (BLiMP-IT)”, Proceedings of GLOW, 47, pp. 1–14. doi: 10.11576/glow-1242.