Text-to-Speech Localization for Global Brand Marketing

By Darin Goble

ThinkstockPhotos-488281659Multimedia is on the rise, permeating areas never reached before, thanks to the growth in technology and also the prolific rise of video and audio sharing technologies and platforms. For many global brands, use of multimedia is growing fastest, eclipsing standard sales and marketing techniques.

According to video-sharing website, YouTube, not only do they have over 1 billion users and 4 billion video views per day, but 60% of a creator’s views come from outside their home country. It is no surprise that video and other multimedia techniques are increasingly being used by global companies to drive brand and social media campaigns.

Audio and video is already used heavily in learning materials; however, with the growing influence of sites like YouTube and Vimeo, using video to build a global brand has become an integral part of any marketing campaign.

Before the Internet and YouTube, using video and television advertising to reach global audiences would have been outrageously expensive and out of reach for many brands. Now, many of the top global brands are consistently using video to reach global audiences. According to Pixability*, 99 out of the top 100 global brands are on YouTube and the top 100 brands have invested approximately $4.3 billion in the creation of video assets to drive global marketing campaigns. For Generation Z (those born after the turn of the century) viral brand videos and social media campaigns through various devices is part of everyday life.

In the localization industry, we are seeing more and more requests for multimedia localization, especially video. Localizing multimedia content can be a lengthy process and expensive. Hiring multiple voice talents, studios, sourcing the right editing, sound and engineering expertise can be a significant investment in terms of time and money. However, latest developments in text-to-speech (TTS) technology has also opened up multimedia localization as a viable option for many global brands. Certain video content does not have to be localized to the same high production standards as a film or television advertisements.

Innovations in TTS are saving global brands time and money.  Rather than have people sit in a studio to record the multilingual versions, scripts can be loaded into synthetic voice software, turning the written word into phonetic text. Years ago, TTS wasn’t an option for many companies, the technology was quite clunky and the output too robotic. Recent technological advancement has meant audio track localization is well within reach, using TTS techniques. Plus, the more scripts you feed and train the TTS engine, the more intelligent it becomes, enabling clients to leverage linguistic assets and further reduce translation costs and improve quality.

In addition, marketing and brand videos that are distributed via social media sites are different to the polished TV advertisements of the “Mad Men” days. Certain video content does not have to be localized to the same high production standards. Techniques, like TTS, produce localization output that is perfectly acceptable to the target audience and will trigger the desired response.

Welocalize has recently developed a specialized solution for text-to-speech, weVoice, which we recently demonstrated at Learning Solutions and Expo this year. We’ve seen some great success with global brand clients. If you are interested in a demonstration, please contact us and we can show you how global brands are utilizing weVoice technology today.

TTS demonstrates one of the many localization techniques that are evolving to meet future client needs. As global brands adapt content, we adapt localization strategies to help their globalization strategy.



Darin Goble is Senior Director at Welocalize. Based in Portland, Oregon, he has worked in the language services industry for over 15 years and leads a global team focused on driving unique localization strategies for a number of high profile global brands.

Further Reading: Text-to-Speech for Localization of Learning Multimedia

*Top 100 Global Brands on YouTube PixTV30, Pixability