How to do human evaluation - A brief introduction to user studies in NLP

Schuff, Hendrik, Lindsey Vanderlyn, Heike Adel, and Ngoc Thang Vu. 2023. β€œHow to Do Human Evaluation: A Brief Introduction to User Studies in NLP.” Natural Language Engineering 29 (5): 1199–1222. https://doi.org/10.1017/S1351324922000535.

Notes

Considerations for human-centered NLP evaluations

In-text annotations

"On the other hand, there are task-specific NLP resources. For example, van der Lee et al. (2019, 2021), Belz, Mille, and Howcroft (2020) provide guidelines on human evaluation with a focus on natural language generation (NLG), Sedoc et al. (2019) present an evaluation methodology specifically for chatbots, and Iskender, Polzehl, and MΓΆller (2021) provide guidelines for human evaluation for summarization tasks." (Page 1201)

"this paper aims to provide an overview that focuses on commonalities of human evaluation across NLP without restriction to a single task and seeks a good balance between generality and relevance to foster an overall understanding of important aspects in human evaluation, how they are connected, and where to find more information." (Page 1201)