Leveraging LLMs to Automate Formality Adaptation in Translation

By Mikaela Grace, Head of AI/ML Engineering, Welocalize

From business correspondence to social media interactions, the tone and formality of communication often vary widely across languages.

October 9, 2024

6 minutes

Blog Post

Abstract representation of AI with digital patterns, symbolizing artificial intelligence and technology

From business correspondence to social media interactions, the tone and formality of communication often vary widely across languages and content types. This creates complexity, particularly for translation systems that have difficulty translating between languages with distinct levels of formality.

Large language models (LLMs) offer a promising solution to this issue. These AI-driven translation tools can automatically and accurately manage formality shifts, reducing the reliance on human translators.

The Challenge of Formality in Translation

In translation, conveying the correct meaning is more than just swapping words. Formality is deeply embedded in the grammatical structures of many languages. For example, languages like Spanish, French, and Japanese have distinct verb conjugations and pronouns for formal and informal speech. This creates a unique problem when translating from languages like English, where formality is often expressed through vocabulary and tone rather than grammar.

Traditionally, human translators have manually adjusted translations to fit the required formality, ensuring that a greeting like “How are you?” is appropriately translated as either “¿Cómo estás?” (informal) or “¿Cómo está usted?” (formal) in Spanish, depending on the context.

Source languages that lack explicit grammatical markers for formality create scenarios where a single source segment could accurately translate into several target segments with varying levels of formality. In addition, sometimes, a company wants to change the formality of its translations, which until recently required an expensive and manual intervention to re-translate source segments into a target segment of a different formality level.

Vera Senderowicz Guerra, NLP Engineer at Welocalize and experienced linguist, explained, “In many languages, formality is conveyed through grammatical structures rather than vocabulary alone. While automating lexical formality adaptation poses significant challenges, addressing grammatical formality with minimal human intervention is even more complex. The selected model must not only identify and convert formal words into informal (or vice versa) but also ensure that all grammatical inflections sharing the same reference within the sentence are consistently aligned. Achieving this would be a complicated task even for many native speakers, let alone for an automated system.”

Leveraging LLMs in Formality Adaptation

The advent of LLMs has opened new possibilities for automating formality adaptation in translation. Designed to process and generate human-like text, LLMs can be fine-tuned to adapt the formality level of translated content automatically. This innovation has the potential to enhance the consistency and accuracy of translations across languages, reducing the need for human post-editing.

“We’re exploring an innovative application of Generative AI to adapt bilingual content’s target segments from a formal to an informal register. Our method focuses on identifying key grammatical structures and inflections that distinguish formality levels in each target language, enabling the conversion of target segments into an informal tone while maintaining accuracy in relation to the source segment.”
Senderowicz Guerra, NLP Engineer at Welocalize

Because foundational LLMs learn from vast data sets that include various forms of communication, from formal business documents to informal social media posts, they can successfully perform formality pivots where traditional NMT has failed. Advanced prompt engineering techniques are required for maximum performance, but they can be utilized for pivots without extensive model training.

Experimental Settings and Results

Welocalize conducted experiments in the marketing and hospitality sectors, primarily working with Romance languages, such as Spanish, French, and Italian, with English as the source language. The research used a combination of proprietary BERT-based classifiers and advanced LLMs like GPT-3.5, fine-tuned with language-specific prompts and example corpora, to maintain translation accuracy while adjusting formality.

The results have been promising. Senderowicz Guerra reported, “In our findings, between 28% and 50% of the segments were classified as formal and subsequently converted by the LLM, and only 5% had to be further edited by linguists, which demonstrates the system’s efficacy.” This success rate indicates a significant step forward in automating a task that traditionally requires human expertise.

However, challenges continue. 5% of the segments, consisting mostly of non-indicative verb forms, had to be edited by human reviewers as they posed difficulties due to their ambiguous person conjugation. This highlights the complexity of the task and the areas where further refinement is needed.

Despite these challenges, the impact on workflow efficiency has been substantial. “Overall, using the LLM as a preprocessing stage reduced human effort significantly: not only did our linguists have to review less than 50% of the segments, but their hourly productivity increased by more than 200% due to the minimal number of changes required,” noted Senderowicz Guerra.

This reduction in human intervention speeds up the translation process and allows human translators to focus on more complex, nuanced aspects of language adaptation. Additionally, the cost savings were significant: 68% for Italian, 77% for Spanish, and 74% for French.

Use Cases and Applications

The research has identified two primary use cases for this technology, each with its own set of challenges and priorities:

Post-editing: In this scenario, the critical focus is avoiding hallucinations to ensure the content is ready or needs minimal editing for immediate publication. The challenge lies in maintaining translation accuracy and reliability while changing the formality in an automated way.
Machine translation preparation and cleaning: The goal is to ensure consistency and accuracy between the source and target content, creating clean, reliable data for machine translation systems.

Automating formality adaptation through LLMs reduces the dependency on human translators to manually adjust the tone and formality of translations, leading to increased efficiency and cost savings. It also makes translation workflows faster and more scalable, particularly for businesses that handle large volumes of multilingual content.

Senderowicz Guerra concluded, “This research underscores the potential for cost reduction and quality enhancement in translation processes, contributing to the industry’s ongoing advancement.”

Looking Ahead: Challenges and Opportunities

This research’s implications extend far beyond improving efficiency and cost savings. Current research has focused primarily on Romance languages. Expanding this approach to cover a broader range of language pairs, each with unique formality structures, presents both a challenge and an opportunity for growth.

LLMs’ capability to automate formality adaptation is particularly valuable in fields like marketing, customer service, and diplomatic communications, where the appropriate level of formality can significantly impact the effectiveness of the message.

Further research and development in this area offer exciting opportunities. Expanding the use of LLMs to other languages and content types, such as technical documentation or legal texts, could open new doors for automation in translation. Moreover, applying LLMs to other linguistic challenges, such as style transfer, could decrease human effort and allow companies to adapt their content more quickly and flexibly.