Generating Task-Pertinent sorted Error Lists for Speech Recognition

Galibert, Olivier, Mohamed Ameur Ben Jannet, Juliette Kahn, and Sophie Rosset. 2016. “Generating Task-Pertinent Sorted Error Lists for Speech Recognition.” In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), edited by Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, et al. European Language Resources Association (ELRA). https://aclanthology.org/L16-1297/.

Notes

In-text annotations

"The WER is thus an error-enumeration based metric which, for its final score, considers every error as equally important. The error importance measure associated to WER is then naturally the occurence count of each error." (Page 1883)

"When ASR is a first step in a more complex task, such as NER, automatic translation or language understanding, numerous studies shown that the WER is not always well correlated to the performance of the overall task, for example, (Garofolo et al., 2000) in the context of an informa tion retrieval task, (He et al., 2011) in the context of speech translation and (Wang et al., 2003) in the context of spoken language understanding." (Page 1883)

"Having a metric that allows to estimate the quality of an ASR system given a specific task is interesting but doesn’t necessarily allow to obtain a list of the most important and frequent errors. However such a list is very important to understand the problems and even improve the ASR system (Dufour and Esteve, 2008)." (Page 1884)

"The NE-WER was introduced in order to create a metric more adapted to case of named entities extraction from ASR output. It is built similarly to the WER, on a Levenstein alignment of reference and hypothesis, but it counts errors only on the named entity spans. NE-WER is given by equation 2, where DNE, INE and SNE are the numbers of deleted, inserted and substitued words belonging to named entities, and NNE is the total number of words belonging to named entities in the reference." (Page 1884)

"Our aim being the creation of an error list ranked by their seriousness for the NER task, the first step is then to generate an error list. The Levenstein alignment (Levenshtein, 1966) used to calculate WER and NE-WER allow us to identify ASR errors." (Page 1885)