Language models assessment through linguistically motivated contrasts
A benchmark for Italian (BLiMP-IT)
DOI:
https://doi.org/10.11576/glow-1242Keywords:
Language model evaluation, Minimal pairs, Morphosyntax, Poverty of stimulusAbstract
We present BLiMP-IT, a linguistically-informed benchmark to assess the performance of Italian Language Models (LMs). Inspired by state-of-the-art tools for LM evaluation and informed both by generative theorizing and psycholinguistic metrics, this benchmark tests a rich variety of structures using minimal pair contrasts, i.e., a grammatical sentence and an ungrammatical one minimally differing with respect to a single morphosyntactic property. Prompting the model to assign a probability value to the sentences within each pair, BLiMP-IT tests LMs accuracy, as well as their ability to reach linguistically meaningful generalizations, ultimately offering insights on human-machine comparability and the validity of the Poverty of Stimulus hypothesis.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Veronica Bressan, Matilde Barbini, Achille Fusco, Sofia Neri, Maria Letizia Piccini Bianchessi, Sarah Rossi, Cristiano Chesi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.