Google says there’s a better way to create high-quality training data for AI translation

On October 14, 2024 paperGoogle researchers highlighted the potential of AI translations refined by humans or human translations refined by major language models (LLMs) as an alternative to traditional references to humans only.

Speaking to Slator, Zhongtao Liu, a software engineer at Google, explained that their research addresses a growing challenge in the translation industry: scaling the collection of high-quality data needed for refinement and testing automatic translation (MT) systems.

As the demand for translation expands across languages, domains, and use cases, traditional methods that rely solely on human translators have become increasingly expensive, time-consuming, and difficult to scale.

To address this challenge, the researchers explored more efficient approaches to collecting high-quality translation data. They compared eleven different approaches – including all-human, all-machine and hybrid methods – to determine the most effective and cost-efficient approach.

MAIN IMAGE - TAAF Report

Slator Translation as a Feature (TaaF) report

The Slator Translation as a Feature (TaaF) Report is an essential and concise guide to how AI translation is becoming an integral part of enterprise technology.

Human-only workflows involved a single human translation step, or one or two additional human review steps. Machine-only workflows ranged from one-step AI translations using the best AI systems – MT systems or LLMs – to more complex workflows, where AI translations were refined by a LLM. Hybrid workflows combined human expertise and AI efficiency; in some cases AI translations were refined by humans (i.e. post editors), while in other cases human translations were refined by LLMs.

“Human-machine collaboration can be a faster and more cost-efficient alternative to traditional human translation collection.” — Liu et al.

They found that combining human expertise and AI efficiency can deliver translation quality comparable to, or even better than, traditional workflows performed only by humans, while significantly reducing costs. “Our findings show that human-machine collaboration can match or even exceed the translation quality of humans alone, while being more cost-efficient,” the researchers said.

The best combination of quality and cost seems to be the human post-editing of AI translations. This approach delivered top quality at just 60% of the cost of traditional human-only methods, while maintaining the same level of quality.

“This indicates that human-machine collaboration can be a faster and more cost-efficient alternative to traditional human translation collection, optimizing both quality and resource allocation by leveraging the strengths of both humans and machines,” they noted.

The researchers emphasized that the quality improvements come from the complementary strengths of human and AI collaboration, and not from the superior capabilities of the AI ​​or the human (post-editor) alone, underscoring the importance of leveraging both human and AI strengths to achieve optimal translation quality results.

2024 Cover Slator Pro Guide Translation AI

Slator Pro Guide 2024: Translation AI

The Slator Pro Guide 2024 presents twenty new and impactful ways LLMs can be used to improve translation workflows.

They noted that LLMs were less effective than human post editors at identifying and correcting errors in AI-generated translations. On the other hand, human reviewers tended to make fewer changes when reviewing human-generated translations. Certain errors may be overlooked. Interestingly, even additional rounds of human review did not substantially improve quality. This observation supports the argument for human-machine collaboration, with each component helping to address the other’s blind spots, the researchers said.

“These findings highlight the complementary strengths of human and machine post-editing methods, indicating that a hybrid method is likely the most effective strategy,” they said.

Authors: Zhongtao Liu, Parker Riley, Daniel Deutsch, Alison Lui, Mengmeng Niu, Apu Shah and Markus Freitag