MT and Post-Editing UGC for Travel and Technology
US-based Elaine O’Curran is Training Manager on the Language Tools and Automation Team at Welocalize. Elaine will be presenting at the 11th Biennial Conference of the Association of Machine Translation in the Americas (AMTA-2014) in Vancouver this month. In this blog, Elaine gives insight into the translation approach for user-generated content in the travel and technology sectors.
We now find that most of our clients have some UGC in their content portfolio and there is an increasing demand to provide this content to a global audience. For simple cost reasons, human translation is often not a viable option and the use of raw machine translation (MT) to publish UGC is a common approach in order to meet the demands of this high-volume, highly perishable content type. Raw MT is not always delivering to the desired quality standards and this is where the LSP comes into play when human translation or post-editing to “just the right quality level” is required to provide optimal results.
There is no widely accepted definition of UGC. A 2007 report by the OECD, Participative Web: User-Created Content, defines UGC as i) content made publicly available over the Internet, ii) which reflects a certain amount of creative effort, and iii) which is created outside of professional routines and practices. In other words, this is online content that is produced by ordinary users that are not technical writers, marketing or media professionals.
There are many types of UGC and typical examples can be found on web forums, wikis, social networking sites, podcasting platforms, file sharing sites, blogs and online marketplaces for consumers. At Welocalize, there are some types of UGC content that we frequently encounter in the technology and travel industries. With many companies in these industries using the Internet (or cloud) as the backbone for their business, UGC is a key component of their business and marketing strategy. Both industries attract a lot of UGC, especially in technical support forums and travel review sites.
This type of UGC is characterized by high volumes and perish-ability. Any mistranslated review, travel or otherwise, can have negative impact on business and this is why clients prefer to send MT-translated reviews through a very light post-editing cycle. Light post-editing can fix critical errors such as misrepresentations, offensive statements, dropped or added negations, untranslated or missing words.
UGC in the form of technical support forums can reduce the cost of providing support to the client. Therefore, it is in the interest of technology companies to provide this content to a more global audience. Frequently, MT is deployed directly on the website to provide on-demand translation. We generally only post-edit forum content which meets certain criteria set by clients: a high number of visits or clicks. The aim is to provide technically accurate translations that will enable readers to solve the problem they are experiencing. Style and fluency are not important.
There are many examples online where you can see how MT output can sometimes go awry. Light post-editing can be performed much quicker than translation, so the combination of MT and post-editing can still translate higher volumes of content than human translations. For post-editing, you don’t need professional translators, simply skilled post-editors, ideally with knowledge of the industry, company and product.
For UGC content which is expected to deliver “high visibility” information – forums, reviews, knowledge-bases – any brand must consider using a combination of MT and light post-editing. The bulk of UGC is published with raw MT; however, each client must determine the impact that the UGC has on the business and brand.
Elaine O’Curran, Training Manager for Welcalize’s Language Tools and Automation
The AMTA-2014 conference will be held October 22-26 at the Renaissance Vancouver Harbourside Hotel. Elaine will be delivering her presentation, “Machine Translation and Post-Editing User Generated Content” at 2:30PM on Thursday, October 23.