How to do human evaluation - A brief introduction to user studies in NLP

Schuff, Hendrik, Lindsey Vanderlyn, Heike Adel, and Ngoc Thang Vu. 2023. “How to Do Human Evaluation: A Brief Introduction to User Studies in NLP.” Natural Language Engineering 29 (5): 1199–1222. https://doi.org/10.1017/S1351324922000535.

Notes

Considerations for human-centered NLP evaluations

Ethical and legal considerations
- privacy
- informed consent
- respect for participants
Research questions and hypotheses
- exploratory research questions
- confirmatory research questions
Variables
- operationalize the measurements
- types
  - independent
  - dependent
  - confounding
Metrics
- Likert scales
- Visual analog scale
- Direct comparisons
- Ranked order comparisons
- Error classification
- Completion time
- Bio signals
Qualitative analysis
Level of measurement
- nominal
- ordinal
- interval
- ratio
Experimental designs
- within-subject
- between-subject
Crowdsourcing for NLP
- fair compensation
- platform rules
- task description
- incentives and response quality
- pilot study
- Data collection
Statistical evaluation of NLP
- estimating the required sample size
- choosing the correct statistical test
- post hoc tests
- multiple comparisons problem
- worked example

In-text annotations

"On the other hand, there are task-specific NLP resources. For example, van der Lee et al. (2019, 2021), Belz, Mille, and Howcroft (2020) provide guidelines on human evaluation with a focus on natural language generation (NLG), Sedoc et al. (2019) present an evaluation methodology specifically for chatbots, and Iskender, Polzehl, and Möller (2021) provide guidelines for human evaluation for summarization tasks." (Page 1201)

"this paper aims to provide an overview that focuses on commonalities of human evaluation across NLP without restriction to a single task and seeks a good balance between generality and relevance to foster an overall understanding of important aspects in human evaluation, how they are connected, and where to find more information." (Page 1201)