What’s New in Machine Translation? Highlights from MT Summit 2021
Machine translation (MT) allows global brands to keep pace with ever-growing content volumes, save on costs, and meet tight time-to-market deadlines to reach wider global audiences with multilingual content strategies.
Experts from Welocalize’s AI Innovation Team participated in the MT Summit 2021, contributing to several sessions looking at advancements and use cases of MT and AI-enabled language technologies. The focus this year’s summit was wide-ranging: from advances in data-driven MT, MT quality estimation and post-editing, to data preparation for MT training and data augmentation using AI.
From all the insightful sessions, here are some of the standout highlights:
MT continues to be the future of translation
Many of the sessions posed the question of “will we ever be able to use raw MT without ever needing human translation or post-editing?”
Even though nowadays the quality of MT output can be outstanding, technology advances have not eliminated the need for human interaction (human-in-the-loop). MT-related solutions are now routinely embedded in the overall localization workflow. As more enterprises continue to embrace the potential of MT, the technology will continue to evolve to greater heights of fluency, adequacy, and actionability.
The Need for Quality (Meta)Data
Many participants stressed the importance of clean high-quality data for engine training. Consequently, many are trying to solve this challenge by focusing on metadata as a method to filter out unreliable data. It’s expected that in the future MT will be able to respond and adapt to different content types, requirements, and domains.
In a similar vein, our Director of AI Deployments, Alex Yanishevsky, spoke about the concept of ‘Smart LQA’ – using AI to evaluate language and content to inform strategic global content decisions such as source and target suitability.
“We’re trying to find a way to identify and flag at risk source content. This ultimately allows us to suggest editing of the source, predict problematic sentences for all target languages and spend LQA dollars in a targeted rather than random fashion.”
During his session, Alex spoke about the need to go down to the sentence level and collect as much metadata as possible about every entity in the sentence.
“We can use this data to start building predictive models and forecast content performance.”
MT Quality Estimation and Evaluation Metrics
Many of this year’s sessions focused on recent innovations in MT evaluation and new evaluation metrics. It’s evident that MT quality is continuing to improve all the time, on occasion even coming close to human parity.
Our NLP Deployment Engineer, Andrea Alfieri presented their research with our AI Program Coordinator, Mara Nunziatini exploring how ‘new’ automatic evaluation metrics compared with human assessment. Adoption of these metrics is not commonplace yet in the language industry. They also spoke about progress of the more established metrics such as translation error rate (TER) and Levenshtein Edit distance.
“All the new automatic metrics analysed showed a better correlation with Human assessment per language compared to the more established metrics: TER and Levenshtein Edit Distance.”
“We noticed that the performance of each metric was different depending on the language which could suggest the idea of having different ‘go-to’ metrics in place, depending on the language in scope.”
Understanding the reasons underlying the difference in scores for the same metric across different languages is still a research topic. However, LSPs can start using these new automatic metrics in production to gather performance data for every language and ultimately to gain widespread use within the industry.
Developments in MT Post-Editing
In the last few years, MT technology has seen major changes with the breakthrough of neural machine translation (NMT), a growing number of providers, translation platforms, and approaches to measuring performance.
MT generally is experiencing a peak in demand from translation buyers. At the same time, new models for defining translation quality are becoming more widely adopted causing profound implications and adjustments for post-editing approaches.
Due to their expertise in the area, the Welocalize AI Program Management team held their third tutorial on MT post-editing at this year’s MT Summit. The team shared the latest trends in the field of MT technologies and discussed their impact on post-editing practices as well as integrating MT on large, multi-language translation programs.
The MT Summit 2021 was packed with fantastic insights, and it was great to hear the latest research and innovations in MT.
If you registered for MT Summit 2021, all the sessions are available to watch on demand on the event website here.
For more information about Welocalize MT & AI-Enabled Solutions, connect with us here.
Click here for more information about our upcoming events.