Germany-based Christian Zeh is a Business Development Director at Welocalize. Christian is one of the presenters at tekom and tcworld 2014, taking place in Stuttgart, Germany, November 11-13. In this blog, he shares common challenges to localizing user-generated content (UGC) and solution options.
User-generated content (UGC) is the fastest growing content type on the Internet. UGC is found everywhere, from travel and review websites to technical forums and online marketplaces. It is part of the countless streams of news and blogs published every day, as well as all the social media sites. Content like consumer reviews, forums and blogs can have a major influence on people’s buying decisions. Studies show that consumer reviews are the second most trusted form of advertising after word-of-mouth.
As Elaine O’Curran discussed in her blog, “MT and Post-Editing UGC for Travel and Technology”, travel websites considerably increase in value if tourists are able to read the content in their native language and technical forums can help users solve problems that were not taken into account when the operating manual or online help was originally written.
Technical forums allow users to share their experiences and provide manufacturers with useful information with regard to potential improvements or sought-after features. Which is why publishing certain UGC in multiple languages can have great benefits to companies.
There are challenges when localizing UGC and social media content:
- The amount of UGC and social content is continuously increasing and the data volume is immense. At the end of 2011, TripAdvisor had around 60 million reviews (source: The Naked Truth About Hotel Reviews, 2012). Now, TripAdvisor reports they have 170 million reviews and opinions covering 4 million accommodations, restaurants and attractions. The main site operates in 45 countries worldwide.
- UGC and social content is often very short-lived. New comments are being added on a regular basis and existing information may become irrelevant in a matter of minutes or hours.
- UGC and social content is often created by private individuals who are not native-speakers and use non-standard language.
- In technical forums, content may be written by “techies” who use their own jargon.
- The sheer multitude of authors increases the lexical and stylistic diversity of this content.
Due to these challenges, it is often unfeasible to employ an army of human translators to deal with the high levels of volume, perishability and ambiguity of content. Organizations must also consider the appropriate levels of quality that are acceptable to the brand when considering localization options. For a lot of UGC, it is acceptable to simply understand the content without the translation being 100% linguistically accurate. Machine translation (MT) plays an important role in the localization of UGC, as it overcomes some of these key challenges.
When, where and how to utilize MT for UGC is essentially determined by quality expectations. These expectations differ depending on the purpose, target group and the desired effect of the localized content.
Options for Translating UGC
No MT: If the objective of the UGC content is to trigger “emotional impact,” the best option is to use human translation. Examples usually include marketing content types with high visibility of brand and value such as CEO blogs, first page product reviews, branded social media content.
MT with Post-Editing: In general, UGC content which is expected to deliver useful information, such as forums, reviews, and knowledge bases, should be post-edited (in comparison to general social media posts). In the case of important or high-visibility UGC (such as Microsoft Knowledge Base), post-editing is necessary. The extent of the post-editing depends again on the quality expectations. The scope can be enormous. It ranges from a simple plausibility check to prevent severe misrepresentations or offensive statements to a full post-edit to bring the text to human translation level. It is not essential that these post-editors are language and translation graduates. Such post editors often belong to a relevant user or interest group, such as travelers, techies, and bloggers. Post-editing of UGC is something that can be well-suited to crowdsourcing.
Raw MT: This may be suitable for massive volumes of UGC and can be published automatically providing the MT output meets the minimum scoring based on a defined scoring system. Utility scoring can be used to rate the comprehensibility of unedited MT. This can be carried out by a trained translator. Alternatively, automated utility scoring, which is created by the MT engine itself, must be taken into consideration for the bulk of the UGC. Studies indicate that about 50% of unedited MT is considered incomprehensible and simply not publishable. There is evidence that statistical MT engines, such as the Microsoft Translator Hub, perform better with UGC due to the amount of translation examples available to its engine.
For the localization of UGC and social media, one thing is certain – you need a strong, reliable team of localization professionals to put the right strategy in place.
Christian.firstname.lastname@example.orgThe tekom and tcworld 2014 conference will be held November 11-13 at the International Congress Center (ICS) in Messe Stuttgart. Christian Zeh will be delivering his presentation, Localizing UGC Content, on Tuesday, November 11 at 11:15AM. Welocalize is sponsoring and exhibiting at tekom in Stuttgart. You can find the Welocalize team at booth #2G09.