Welocalize, CNGL and iOmegaT: Measuring the Impact of MT on Translator Speed

Dave_ClarkeThe CNGL Centre for Global Intelligent Content. (“CNGL” stands for Centre for Next Generation Localization) is a collaborative academia-industry research centre. Established in 2007, CNGL combines the expertise of researchers at four Irish universities and localization industry partners to produce tools and technologies. Welocalize is one of CNGL’s industry partners. Welocalize has worked closely with CNGL since 2011, providing funding, engineering, development and project management resources. Welocalize recently announced the licensing of the iOmegaT suite of tools. iOmegaT is one of a number of collaborative projects Welocalize is working on with CNGL.

Dave Clarke is Principal Engineer on the Welocalize Language Tools team and works closely with CNGL in Dublin. Welocalize’s Louise Law interviewed Dave to find out more about iOmegaT and Welocalize’s work with CNGL.

Can you tell us a bit more about iOmegaT?

The term iOmegaT really should describe two main tools which, in combination, make it possible for us to gather real-time translation activity data and later analyze it to produce granular, yet understandable information about translation behavior and speed.  First, translator user data is gathered and written to XML files from an adapted version of the open-source computer-aided-translation (CAT) tool OmegaT.  The adaptation gives rise to the “i” in the name iOmegaT — it stands for instrumented.  OmegaT is a free, open-source, Java-based CAT application for professional translators. It has been a community development project for over 10 years.  iOmegaT is simply OmegaT with some added event logging code and other features to extract information about translation segments, such as are they no match human translation, fuzzy-repair or MT post-edit. As well, how the translator works with them.  The other tool parses the XML instrumentation data produced in the editor and then analyses, among other things, how the presence of machine translation (MT) affects translation speed of a translator compared to their normal from scratch translation rate.

In June 2013, Welocalize announced the collaboration with CNGL on the development of iOmegaT and in March 2014, Welocalize announced the licensing of the iOmegaT technology.

How did iOmegaT come about?

Very shortly after Welocalize joined CNGL in 2011, conversations began between me and John Moran, a PhD. student at CNGL.  John had a research idea to gather real-time data from translators post-editing MT output and also translating “from scratch” with no aid from MT or TM.  The idea really resonated with me, as we were suffering from a huge metrics gap when dealing with MT and it was difficult to accurately tell if MT output from different vendors was useful to the end user (post-editor) or not.  John developed a rough prototype, based on OmegaT.  The idea was that the open-source CAT tool would allow a developer to add a reporting component that recorded the time spent translating segments of a translation job.  By recording how long it took to post-edit x number of words and compared that to the time spent translating x words from scratch, this could provide a more reliable measure of whether the MT helped the translator or not.

Since OmegaT was both a fully-functional CAT tool and open-source, it really ‘fitted the bill’.  It allowed us to closely emulate a translator’s normal environment as well as providing the level of access we needed to the application software events from which we could gather the detailed data.  This is where the iOmegaT concept came from, “i” stood for instrumented and the proposal was to take the open OmegaT source code and add code to it to record the duration of segment translation and other important information.  Welocalize arranged for John to do the majority of the development on-site in Welocalize in Dublin.

How do OmegaT and iOmegaT help clients, translators and LSPs?

Adapting the iOmegaT software allows real-time streams of data from system events, generated by the translators within OmegaT, to be recorded and then analyzed.  Highly granular data like this is currently virtually impossible to gather using conventional proprietary CAT tools.  By using OmegaT as the basis tool, iOmegaT enables this data to be collected and used to measure the impact of MT on translator speed relative to traditional from scratch translation. Results can also be used to establish fair-pricing of post-editing work and to help identify circumstances, where translators can benefit most from MT in terms of speed.

Do OmegaT and iOmegaT integrate with Welocalize’s open source TMS, GlobalSight?

Yes.  The integration of Welocalize’s GlobalSight 8.5 with OmegaT provides a fully compatible editor that translators can use when working with GlobalSight translation kits. This means we have an open-source GMS married to an open-source desktop translation environment.  We made an announcement last year about how GlobalSight 8.5 features the integration of OmegaT.  The additional instrumentation code is merged with each release of OmegaT to maintain synchronization of functionality between OmegaT and iOmegaT.

What’s next with CNGL?

We will continue our work on iOmegaT, moving towards seamless integration with the GlobalSight production workflows.  Rolling-out OmegaT as the open-source desktop translation environment will mean that we can use iOmegaT to periodically test the performance of the MT programs we put in production for our clients.  We are also working on a number of other exciting technology projects as part of our overall research program as CNGL platform industry partners.  Welocalize and CNGL make a good team. We often host CNGL developers at our Dublin site and we have some great collaborative work in the pipeline due for release soon.  The iOmegaT project is a great example of how academia and industry have joined together, the culmination of the work being Welocalize licensing the technology from Trinity College Dublin, as part of CNGL.

Dave Clarke is Welocalize Principal Engineer for Language Tools