The Welocalize Language Tools Team recently presented at the 2015 EAMT Conference in Antalya, Turkey. Olga Beregovaya, Welocalize VP of Language Tools and Automation was the invited guest speaker at the conference. She presented, “What we want, what we need, what we absolutely can’t do without – an enterprise user’s perspective on machine translation technology and stuff around it,” with the main objective of promoting collaboration between academia and field users. Olga also presented with Welocalize Senior Computational Linguist Dave Landan “Streamlining Translation Workflows with Welocalize StyleScorer,” as part of the project and product description poster session.
In this blog, Olga Beregovaya, Dave Landan and Dave Clarke, Principal Engineer for the Language Tools Team, share their insights from the 2015 EAMT Conference.
HIGHLIGHTS FROM OLGA BEREGOVAYA
Olga Beregovaya gives her impressions of EAMT 2015 and highlights her favorite presentations from the user track.
As a global language service provider, the language technology and translation automation strategy is very important. The EAMT conference and associated conferences are excellent forums to attend as the team can share real-life MT production experiences and learn more about the latest innovations and research projects. As always, there were many interesting research papers and posters at EAMT, all delivered by highly-talented colleagues in the field of MT and all describing very innovative and promising approaches.
I was proud of Welocalize’s own poster presentation, describing work by colleague Dave Landan, Streamlining Translation Workflows with StyleScorer. Capturing and evaluating the style of both training corpora and target text has traditionally been one of the biggest challenges in the industry. The tool Dave has created allows us to compare style of the input text and the available training data, and build the most relevant MT engine, and also to assess the stylistic consistency of the target text and its adherence to the client’s style guide.
The poster presented by Mārcis Pinnis, Dynamic Terminology Integration Methods in Statistical Machine Translation, was very interesting for the team. Integrating terminology in a linguistically aware way is a major pain point for domain adaptation of SMT engines. Speaking as a program owner, this poster presentation was particularly relevant to our work.
Another very relevant presentation was the paper delivered by Laxström et al, called Content Translation: Computer-assisted translation tool for Wikipedia articles. This presentation talked about a tool created by Wikipedia to promote translation and post-editing of machine-translated articles by Wikipedia users. Community translation is more important for Wikipedia than for any other organization in the world. As content democratization is the key paradigm shift of the modern times, such tools that enable a “casual translator” to contribute and make content available globally have become an essential component of the global content universe.
Finally, Joss Moorkens and Sharon O’Brien presented an excellent poster called Post-Editing Evaluations: Trade-offs between Novice and Professional Participants. Building an efficient and productive supply chain for post-editing, that would be open to new tools and new ways of working, is an essential component of an LSP MT program success. Joss and Sharon compare the perception of MT output and a new CAT environment by experienced translators and by novice users.
HIGHLIGHTS FROM DAVE LANDAN
Dave Landan, Computational Linguist at Welocalize and EAMT 2015 presenter identified two presentations he found particularly interesting.
This year’s EAMT conference started strong with several interesting talks and papers on a range of topics. While there were many strong research papers, I would like to mention two that stood out for me. Bruno Pouliquen presented findings on linear interpolation of small, domain-specific models with larger general models. At Welocalize, we hope to try these methods with our own data, and we are optimistic about the possibilities! The other research paper that stood out for me was by Wäschle and Riezler. This paper presented innovations around using fuzzy matches from monolingual target language documents to improve translations. I am excited about expanding our collaborations with the academic community.
HIGHLIGHTS FROM DAVE CLARKE
Dave Clarke, Principal Engineer at Welocalize is a regular participant at EAMT. One topic that was touched on many times at EAMT 2015 was the evolution of CAT tools and their impact on productivity. He shared the following perspective.
From a technical or tools perspective, the EAMT conference provided considerable insight into how translation tools could and should evolve. One such insight was provided by the best paper award winner, “Assessing linguistically aware fuzzy matching in translation memories,” by Tom Vanallemeersch and Vincent Vandeghinste from the University of Leuven. The algorithms typically used in CAT tools to calculate fuzzy match values from translation memories have little or no linguistic awareness. They are firmly established as stable units in our industry word currency. This paper implemented and tested alternative fuzzy match algorithms that identify potentially useful matches, based on their linguistic similarities. The results were gathered from tests carried out with translation master’s degree students measuring translation time and keystrokes. The results strongly suggest the potential for unlocking further productivity from existing resources.
The other presentation that stood out for me was “Can Translation Memories afford not to use paraphrasing?” by Rohit Gupta, Constantin Orasan, Marcos Zampieri, Mihaela Vela and Josef Van Genabith.
More MT productivity and quality can be achieved with incremental and specialized improvements; however, it will be a cumulative process. Importantly, NLP can drive ‘intelligent’ aids to productivity, including auto-suggest/complete, advanced fuzzy matching and automatic repair and others, within a translator’s working environment. Not all will benefit every user. CAT tool platforms may now evolve so that these innovations can be quickly absorbed into the environment with little cost or effort. This leads to how each translator can maximize their own productivity with the combination of aids that best suits their style of work. We even saw a project from ADAPT in the early stages of developing a platform for CAT tool designers that allows the fast definition and measuring of data during testing of prototype productivity-enhancement functions.
To echo the words of the outgoing EAMT President, Professor Andy Way, it was good to see researchers really getting to grips with specific, known problems. It was encouraging to see more focused work on such errors that we know first-hand to have a particular impact on productivity, for example, improvements in terminology selection, new methods to improve choice of preposition and more. It was also encouraging to see the increase in research presented with supporting data gained from end-user evaluation rather than the automatic evaluation metric staples that have long been the norm. In fact, ‘BLEU scores’ almost, just almost, became a dirty… bi-gram.
“Overall, EAMT 2015 was a great conference, attended by extremely talented people, and we should not forget to mention in beautiful Antalya, Turkey, where the conference was held this year,” Olga Beregovaya.
View Olga Beregovaya’s EAMT presentation, “What we want, what we need, what we absolutely can’t do without – an enterprise user’s perspective on machine translation technology and stuff around it” below.
For more information about Welocalize’s MT program, weMT, click here.
Click the link to see Dave Landan and Olga Beregovaya’s EAMT poster presentation, Streamlining Translation Workflows with StyleScorer: EAMT_POSTER 2015 by Welocalize.