Title: Survey of Automatic Metrics for Evaluating Machine Translation at the Document Level
Authors: Nicolas Dahan, Rachel Bawden, François Yvon
Published: 2024-11-22
Link: https://hal.science/hal-04798759

Abstract

This report presents a survey of document-level automatic metrics for machine translation (MT), addressing the need for sophisticated evaluation methods that extend beyond sentence-level assessments. Traditional metrics, which evaluate translations on a sentence-by-sentence basis, often fail to capture the complexities of discourse phenomena, leading to gaps in assessing coherence, cohesion, and cross-sentence dependencies. The report starts by introducing the terminology and notation relevant to document-level MT evaluation. It then describes the linguistic phenomena that are crucial at the document level, related for example to lexical and grammatical cohesion, and overall text coherence, which pose significant challenges for MT systems. Following this, we explore human evaluation protocols targeting document-level translation, discussing the methodologies used to judge translation quality in a more holistic manner. Studying human judgments is necessary, as automatic metrics often aim at reproducing them. We also examine the various test sets that have been developed to support the evaluation of document-level MT. The core of the survey focuses on automatic evaluation metrics designed for document-level translation. These metrics aim to provide a more accurate representation of translation quality by considering the broader context and long-range dependencies within a text, offering a more comprehensive assessment than sentence-level metrics. The report concludes with an overview of the current trends in document-level MT evaluation, summarizing key challenges and identifying areas for future research. It emphasizes the need for the development of context-aware metrics and the importance of creating standardized, document-level test sets to advance MT evaluation.