The annual translation technology conference, memoQfest Americas, took place in Los Angeles last week. David Landan from Welocalize’s Language Tools Team was invited to present about MT at the conference. In this blog, he shares his experience.
Presenting at memoQfest Americas 2014 was an important event for me in several ways. Not only was it my first time attending a memoQfest conference, it was also my first time representing Welocalize at a conference. The icing on the cake was being asked to give a talk about MT at the event.
Public speaking isn’t my strong suit. (Unless it’s about wine, but that’s another story). I live near Portland, Oregon (rain, anyone?), so when when I was asked to spend a few days in sunny Los Angeles in February, I didn’t need to think twice. I put my nervousness aside, prepared a talk, and packed my bags.
MemoQfest was a unique experience. Kilgray Translation Technologies is a fast-growing translation technology company that makes memoQ, an advanced translation environment for translators and reviewers. The company has been putting on the event for several years in their home country of Hungary. For the past few years, they have also hosted an annual event in the US. While many company-sponsored conferences are free to attend and used as an opportunity to sell product or gain exposure, memoQfest attendees pay to attend and most are die-hard memoQ users. Attendees are primarily translators and project managers (PMs), with a few executives, salespeople and tech support folks.
Kilgray uses the event for education, and it is a way for them to both offer workshops on current versions of their products and to announce what is in the works for the next release. What is most notable and refreshing is that the Kilgray folks court criticism at the event. They genuinely want to make their users happy and they take the criticism and feature suggestions seriously. This year’s upcoming release includes fixes and new features suggested at last year’s memoQfest conferences.
Machine translation (MT) is a big, exciting topic in the localization industry. MT was represented in the presentations (mine included) and in the discussions that were happening outside of the scheduled events. I presented a rather technical talk about Welocalize’s work in improving localization throughput by using a set of analytical tools to make MT better. Click here: Better translations through automated source and post-edit analysis, to view the slides from my presentation.
One thing that surprised me was how many translators use generic MT (like Google or Bing) in their day-to-day work. The thing that people need to understand is that computers are dumb. If I ask you what word comes next in the sentence “I need to pick up a dozen eggs and some milk from the …” you’d probably guess something like “store” or “market”. In statistical natural language processing, if your training data includes the phrase “milk from the” followed by “cow” often, then the system will think that “I need to pick up a dozen eggs and milk from the cow” is a perfectly reasonable sentence, because it’s the one with the best probability given the data that was used to train it.
MT output is only as good as the data used to trained the engine. With large generic MT engines, the training data is very noisy. In fact, some of the training data that’s automatically scraped ends up being someone else’s unedited bad MT output. Garbage in, garbage out as they say. Not to say that everything you get from generic MT is garbage. Google and Bing do reasonably well for high-resource languages in general domains. If you need professional quality work, you need a professional quality MT engine. To get a professional quality MT engine, you need good data and you need to use translators to post-edit the MT output, depending on what quality levels are required.
What we have developed within the MT and Language Tools team at Welocalize is a way to identify good, clean data so you start with a better engine. We don’t stop there — our tools can identify trouble spots in MT output and we have tools and processes for post-editing that provide a feedback loop to keep improving on every project. Exciting stuff, right?
Now, if only I hadn’t brought the rain with me from Portland to LA.
Email me at email@example.com