Careless Whisper - Speech-to-Text Hallucination Harms

Koenecke, Allison, Anna Seo Gyeong Choi, Katelyn X. Mei, Hilke Schellmann, and Mona Sloane. 2024. “Careless Whisper: Speech-to-Text Hallucination Harms.” The 2024 ACM Conference on Fairness, Accountability, and Transparency, June 3, 1672–81. https://doi.org/10.1145/3630106.3658996.

Notes

In-text annotations

"While many of Whisper’s transcriptions were highly accurate, we find that roughly 1% of audio transcriptions contained entire hallucinated phrases or sentences which did not exist in any form in the underlying audio." (Page 1672)

"We evaluate Whisper’s transcription performance on the axis of “hallucinations,” defined as undesirable generated text “that is nonsensical, or unfaithful to the provided source input”" (Page 1672)

"we provide experimental quantification of Whisper hallucinations, finding that nearly 40% of the hallucinations are harmful or concerning in some way" (Page 1672)

"Our key insight (at the time of analysis) is that hallucinations are often non-deterministic, yielding different random text on each run of the API" (Page 1674)