EAMT 2022: What We Learned and What’s Ahead
Faster, More Efficient Edit Distance, MT for Audiovisuals, Building MT Engines for Low-Resource Languages, and Gender Bias in MT
by Mara Nunziatini, AI and MT Program Coordinator at Welocalize
A couple of weeks ago, I had the opportunity to attend the 23rd Annual Conference of the European Association for Machine Translation (EAMT) in Ghent. Ghent is a small city in Belgium, close to Brussels.
I had a wonderful time at the EAMT conference, caught up with old colleagues, and got to know many other bright, MT-savvy attendees. The coffee breaks, welcome drinks, and gala dinner were excellent opportunities to network, exchange opinions, and discuss hot MT-related topics. I especially enjoyed meeting other MT gurus in real-life after so many online conferences.
Apart from the great location, great food, great organization, and great people, there were also many great presentations. All were interesting, but a few were especially relevant to me and my team.
COMET vs. COMETinho
First, we learned about a new Edit Distance metric. COMETinho, as the name suggests, is a lighter version of COMET. This new metric is very exciting and topical. Where COMET requires a very large computational power to calculate, COMETinho is faster and less resource-intensive.
Staying up-to-date on the new metrics proposed by academia and the language industry is paramount for our team. Last year, we submitted a conference paper to the MT Summit, highlighting COMET as the metric best correlated with human assessments. Now that a new metric has come out, we’re very curious to see how it will compare. We’re definitely taking a close look at this newcomer.
Machine Translation for Audiovisuals
Using machine translation (MT) for audiovisual content also got a lot of attention this year. Many of the papers presented focused on direct speech translation (which translates audio without intermediate transcription, based on sequence-to-sequence learning technology), automatically generated subtitles, Automatic Speech Recognition (ASR), and subtitle translation with MT.
In our experience, ASR and translation with MT is an exciting, but challenging exercise, mainly due to imperfections in transcription and subtitle segmentation. Customers increasingly ask for this service, and it was great to see the approaches others use to overcome some key challenges for MT.
MT for Low-Resource Languages
Low-resource languages were another big focus. This is often a hot topic at MT conferences in general – how can we build good MT engines without much data? It’s difficult to get good performance for under-resourced languages with the small amount of data available.
In the past, we discussed different approaches to make up for this lack of data (different strategies for data augmentation, for example). This year, the research took a step forward and proposed some new, interesting solutions, including 1) multilingual engines with knowledge-sharing on the encoder side, and 2) multiple subword tokenization and cross-teaching between high-resource and low-resource language pairs.
Our team is also investigating different approaches to build high-performance MT engines for under-resourced languages. My colleagues Eirini Zafeiridou and Jon Cambra presented this topic at the New Trends in Translation and Technology (NeTTT) conference which took place in Rhodes, Greece in July.
Gender Bias in MT
Last, but not least, we discussed gender bias in MT. It’s a relatively new topic in the MT field, but one that’s rapidly gaining more attention. It’s actually more correct to talk about “apparent” gender bias, as the engine itself is not intentionally sexist. Ultimately, this would be a matter of training data used. Therefore, gender bias in raw MT output could be mitigated by carefully checking the vocabulary distribution in the training data.
The conference left us with lots of food for thought and inspiration for future work. It’s always a pleasure to join these events, as they’re the perfect place to exchange views with our peers, keep up-to-date with new discoveries by field researchers, and get a clearer understanding of what the future holds. I can’t wait for the next one!
About Mara Nunziatini
Based in Barcelona, Mara has worked at Welocalize for 7+ years and is a fully qualified English/Spanish -> Italian translator. She manages and leads several AI and MT programs at Welocalize.