Rethinking Model Evaluation as Narrowing the Socio-Technical Gap

Liao, Q. Vera, and Ziang Xiao. 2025. “Rethinking Model Evaluation as Narrowing the Socio-Technical Gap.” arXiv. https://doi.org/10.48550/arXiv.2306.03100.

Notes

HCI and NLG Evaluation Methods for LLM Evaluation

Pasted image 20250606105631.png

In-text annotations

"However, these human evaluation practices have also been widely criticized for lacking standardization, reproducibility, and validity for assessing model utility in real-world settings" (Page 1)

"socio-technical gap, a challenge that HCI research has long contemplated regarding the inevitable divide between what a technology can do and what people need in the deployment context" (Page 2)

"there is an inevitable gap between the human requirements in a technology deployment context (we refer to as socio-requirements hereafter to mean context-specific human requirements) and a given technical solution." (Page 2)

"As computational mechanisms to be embedded in diverse social contexts, ML models will inevitably face the socio-technical gap" (Page 2)

"model evaluation should make a research discipline that takes up the mission of understanding and narrowing the socio-technical gap" (Page 2)

"Goal 1 (G1): Studying people’s needs, values, and activities in downstream use cases of models, and distilling principles and representations (e.g., taxonomies of prototypical use cases and socio-requirements) that can guide the evaluation methods of ML technologies." (Page 2)

"Goal 2 (G2): Developing evaluation methods that can provide valid and reliable assessments for whether and how much human needs in different downstream use cases can be satisfied. That is, evaluation methods should aim to be the “first-order approximation” for downstream sociorequirements while articulating their limitations and tradeoffs: e.g. they are proxies for some socio-requirements, and each of them may only represent one aspect of sociorequirements." (Page 2)