The weMT Engine, part of the weMT framework, is Welocalize’s adaptation of the open source Moses MT engine built in collaboration with Precision Translation Tools (PTTools), the developer of DoMT™ Server. The weMT Engine consists of numerous analytics-driven, pre and post-processing tools to maximize training data efficacy and system efficiency to deploy the best quality MT engine possible.
weMT Adaptation of Moses Features
The weMT adaptation of Moses consists of numerous analytics-driven, pre- and post-processing tools to maximize training data efficacy and system efficiency to deploy the best quality SMT engine possible. We use many of these same tools for building and deploying Neural MT , in addition to NMT-specific tools and processes that we have developed in-house.
The engine deployment is a data-driven, end-to-end process commencing with data verification and cleaning. After the MT engine is trained and passes a series of critical threshold tests for automatic scoring and human evaluation, it is deployed and can be used in a globalization ecosystem, namely a TMS with automated workflows.
Engine Training-Retraining Life Cycle:
All steps involved in training an MT engine are driven by analytics. First, one or more parallel corpora are checked for data integrity, such as misalignments and unreadable characters. A report is generated and any anomalies are remedied or removed to prevent poor data aggregation. Next, the corpora are cleaned and tokenized, and duplicate segments are removed. Finally, data for training, tuning and testing is selected based on highest relevance to the content that will be used in deployment. The resulting MT engine is retrained when additional applicable corpora are available and/or when prioritized feedback on defects is received from linguists.
After the MT engine is trained and passes a series of critical threshold tests for automatic scoring and human evaluation, it is deployed and can be used in a globalization ecosystem, namely a TMS with automated workflows. In other words, all matches below a certain threshold will be sent to the MT engine.
Translation in an Offline CAT Tool:
After both TM and MT are leveraged, the matches are color coded for the translator in a CAT tool allowing for easy identification of the different categories: ICE, 100%, fuzzy matches, and MT.
Intuitive tag-handling is an important requirement for MT engines. Machine translation of plain text or even partial tag handling support still leaves the translator with the task of reordering tags and thus, significantly reduces his productivity. Based on statistical probabilities and linguistic markup, the weMT Engine can properly place tags according to the grammatical rules of the target language and yield an output that requires less post-editing.
The weMT Engine is trained using:
- Ubuntu 12.04 LTS with 32G RAM
- 24 cores
- 3T RAID
The weMT Engine is deployed in the cloud using AWS (Amazon Web Services). The weMT Engine can be used in standalone translation mode, with Welocalize’s Dispatcher middleware or have a connector to Welocalize’s TMS, GlobalSight 8.5.5 and higher.
Why weMT Adaptation of Moses
Welocalize MT solutions enable organizations to develop the right global content strategy based on scale, voice, quality, productivity and time to market. We help you optimize the total return on your content investments. You can expect to realize benefits including:
- Faster performance via XML-RPC protocol
- Support of XLIFF 1.2
- Support of “intelligent tag repositioning”