Data sampling and (In)stability in machine translation evaluation

Téléchargement	Voir la version finale : Data sampling and (In)stability in machine translation evaluation (PDF, 294 Kio)
DOI	Trouver le DOI : https://doi.org/10.18653/v1/2023.findings-acl.826
Auteur	Rechercher : Lo, Chi-Kiu¹; Rechercher : Knowles, Rebecca¹
Affiliation	Conseil national de recherches Canada. Technologies numériques
Format	Texte, Article
Conférence	The 61st Annual Meeting of the Association for Computational Linguistics (ACL’23), July 9-14, 2023, Toronto, Ontario, Canada
Résumé	We analyze the different data sampling approaches used in selecting data for human evaluation and ranking of machine translation systems at the highly influential Conference on Machine Translation (WMT). By using automatic evaluation metrics, we are able to focus on the impact of the data sampling procedure as separate from questions about human annotator consistency. We provide evidence that the latest data sampling approach used at WMT skews the annotated data toward shorter documents, not necessarily representative of the full test set. Lastly, we examine a new data sampling method that uses the available labour budget to sample data in a more representative manner, with the goals of improving representation of various document lengths in the sample and producing more stable rankings of system translation quality.
Date de publication	2023-07-09
Maison d’édition	Association for Computational Linguistics
Licence	Creative Commons, Attribution 4.0 International (CC BY 4.0) https://creativecommons.org/licenses/by/4.0/deed.fr
Dans	Findings of the Association for Computational Linguistics: ACL 2023 (9 juillet 2023) : 13064–13074.
Langue	anglais
Publications évaluées par des pairs	Oui
Exporter la notice	Exporter en format RIS
Signaler une correction	Signaler une correction (s'ouvre dans un nouvel onglet)
Identificateur de l’enregistrement	bceb07fa-0260-423d-91fc-7e7af550dc8b
Enregistrement créé	2023-07-17
Enregistrement modifié	2023-11-02

Détails de la page

Par :

Conseil national de recherches Canada

Date de modification :: 2026-04-20