Posts

What E-Commerce Teams Need from a Strategic Language Service Provider

iStock_000013154457_LargeAt Welocalize, we implement localization solutions for companies and global brands, all from different verticals, and of different shapes and sizes. What do they have in common? The vast majority need to sell their products and service online. We now live in the age of the online consumer, and the ability to learn about, engage with and purchase products and services online is key to meeting customer expectations.

Companies that once were defined as ‘traditional’ store-front retailers now fully embrace e-commerce to the extent that it is core to their business, supporting functions as diverse as sales, product information management, brand marketing, customer service, and crucially, international growth.

The e-commerce team’s function and its role in the dissemination of product, brand and company information is now more critical than ever. With inventory and source content typically managed and channelled by a central team, the localization function for e-commerce provides essential support in supplying the language variants and helping to capture global audiences and revenue.

Here are the top five characteristics that e-commerce teams need to look for when choosing a strategic language service partner (LSP):

SCALE: Large e-retailers need to translate product descriptions for many thousands of SKUs. In today’s world of fast fashion, quick-moving trends and seasonal ranges, there is a constant churn of new content for translation, with initial launches for new languages or seasons sometimes in the millions of words.

Select a partner that has the following capabilities:

  • Ability to support all target languages and locales
  • A super-robust, scalable supply chain for each language
  • Ability to turn around large volumes of content in short time-frames
  • Excellent purchasing power within the supply chain to give you best value for money

TECHNOLOGY & AUTOMATION: In the retail sector, there are high volumes of content, often with short, aggressive time-scales, potentially many individual hand-offs. There are also many participants in the supply chain, including authors, PMs, translators, reviewers with others, and potentially many languages requirements. You will want to automate as much as possible and reduce any manual steps.

Select a partner that has the following capabilities:

  • Connectors to your PIM or e-commerce platform to pick up and deliver the content and eliminate manual hand-offs
  • Workflow automation to automate and accelerate the workflow, from source to translation to review to delivery, incorporating translation memory, glossary tools and review tools
  • Automated validation tools to capture simple errors, support the work of the translators, assure file integrity and avoid corruptions
  • Machine translation (MT) engines, customized for your content, to reduce cost and help with scale, teamed with expert, experienced post-editors
  • A client-facing portal which allows your stakeholders to send in and track ad-hoc requests outside of the standard workflow
  • Potentially proxy or hosted translation solutions to reduce your internal IT footprint

QUALITY MANAGEMENT: Quality, impact and customer experience, is everything. It is important that your brand is properly represented in the target language market and that the experience for the customer is flawless and culturally appropriate.

A strategic LSP should have a transparent and robust quality management system, which they can show you in detail. Their teams to include experienced quality managers who can capture your specific requirements and preferences and assure these are implemented throughout the translation process. LSP’s should be able to define, influence, monitor, measure and control the desired quality levels.

Typically, in large scale translation project things may not always go right, and so the approach and effectiveness of problem solving is important. How your LSP will help the outputs go from good to great is even more important than delivering good sample translations at the beginning.

Select a partner that has the following capabilities:

  • Defined quality management processes for complex workflows
  • Quality certifications based on ISO and industry standards
  • Dedicated language quality review, testing and in-country resources

SEO & MULTILINGUAL DIGITAL MARKETING SUPPORT: Search engine optimization (SEO) is foremost in the minds of most e-commerce decision makers. While much attention is given to SEO strategy for English, often things fall down when you start to scale across many languages.

A lot of money is invested in e-commerce stores and subsequent translations. This investment needs to be followed through with a defined multilingual SEO and digital marketing strategy to ensure performance – do not rely on the central SEO team to support all locales; they will not have the capacity or the linguistic expertise.

Select a partner that has the following capabilities:

  • Research and identify the correct keywords for your target markets, as a direct translation is not enough, you need to use the terms people use when searching in the target country
  • Identify less obvious  keyword opportunities for each target market, there may be easy wins
  • Ensure keywords are correctly incorporated into the content including meta title descriptions and other page attributes
  • Know which search engines are more important for each market
  • Provide technical SEO support
  • Prioritize spend by creating content for the most important landing or category pages
  • Help with multilingual digital marketing upon launch to drive traffic including paid search, ad creation, social media outreach and engagement and link building strategies

THE RIGHT TEAM

Select a team who shares your passion and motivation to succeed. E-commerce today is strategic and central to any retailer’s strategy. It is important your LSP understands the stakes, the risks, the visibility involved and can add value to your team at every step of the way.

External program managers and account managers may end up liaising with many of your internal teams on a daily basis, including: development and IT teams, e-commerce vendors, creative teams, external agencies, site merchandisers and PIM personnel.

Priorities will shift and change, unforeseen issues will arise and, new requirements and complications will abound. Your LSP needs to be flexible, proactive and risk aware and must show they can own and drive the localization roadmap, assuring integration of technology, content and resources across many languages so that you can launch on time. Momentum and urgency need to be maintained all the way through the supply chain. As a client, your needs need to be articulated and advocated throughout the internal teams and functional leads. Use their internal experts including digital media managers, quality managers, solution architects and MT experts to brainstorm, solve problems and create value for your internal team.

A LSP’s knowledge and commitment to integration of technology, content and resources across many languages will mean you will launch your multilingual e-commerce on time, on brand, and with the desired results

robert martinI am interested to know whether you agree with these five characteristics. What else you would add to the list? Please send me your thoughts Robert.martin@welocalize.com.

Robert

Based in London, Robert Martin is Business Development Director at Welocalize.

LinkedIn: https://uk.linkedin.com/in/robertglobal

Twitter: @robert_global

 

 

 

Source Authoring Improves Machine Translation Programs

iStock_000071166341_MediumMachine Translation (MT) is a valuable way to reduce localization costs and get to market faster. MT can also be a complex process with quality issues and excessive post-editing.

MT is fast becoming a significant part of many localization workflows and raw “gisting” MT, post-editing MT (PEMT) and conventional human translation often coexist within the same localization program. At Welocalize, with PEMT we can see between 10% and 100%+ productivity gain, depending on language, content complexity and desired quality level. For many clients, we post-edit MT output for a wide range of content types, including: technical documentation, marketing and training materials, UI, website content, UA and consumer support documentation and user generated content (UGC). One of the areas we look to improve MT quality levels is to see how “MT-friendly” is the source content. Content optimization and pre-editing for MT across a wide range of source texts and styles can be a good solution for improving output, keeping costs down and the volume of publishable content high.

acrolinx logoI recently delivered a joint Welocalize and Acrolinx webinar with Olga Beregovaya, Welocalize Vice President of Technology Solutions, and a number of content software experts from Acrolinx. The joint webinar, New Breakthrough with MT? The Secret is in the Source, shared secrets and best practices on how to optimize source content and increase MT readiness. Using sample data from several domains, we wanted to investigate whether improving the source authoring works and whether source language optimization software improves the overall effectiveness and efficiency of the MT workflow.

Typical MT Output Issues

There are a number of issues associated with MT output, including: capitalization, punctuation, spacing, inconsistent terminology, word order, omissions and additions of text and compound formation. Many of these issues can be controlled and resolved by introducing the concept of “quality at the source.”

Case Study Exercise

In the webinar, we discussed the methodology and results of a case study exercise undertaken by Welocalize and Arcrolinx teams, using MT, PEMT and source content optimization software to addressing the quality of the source content.

We took a number of samples with the approval of our clients, totaling 1000 words, translating content from English into German. Each sample went through customized MT engines for translation. The first time, the source was unedited by Acrolinx technology. In the second cycle, the copy sample was analyzed using Acrolinx technology that analyzed the content and proposed a set of changes based on Acrolinx “writing for MT” rules.

As a result, 52% of the source content was re-authored, based on Acrolinx recommendations to improve the source. Overall, with no source content analytics, MT output required 52% PEMT. After introducing the Acrolinx proposed changes, only 43% of MT output required post-editing. 68% of the re-authored segments produced better MT quality according to human ranking and the PE distance, “how much effort is required to bring MT output to desired quality level” improved significantly by 9%.

By addressing the source quality, the improvement in PE distance translates to:

  • 7-8% productivity gain for translators
  • 5% post-editing discount improvement
  • 5% time-to-market improvement

Adding an additional technology layer to improve the quality of the MT input does improve the overall performance of the MT program. One of the secrets to MT success is to continually train MT engines, resulting in a more intelligent process that keeps volumes high and costs low. This methodology also applies to the source content authoring. The more content is processed through the Acrolinx platform, the more intelligent it becomes in time.

At Welocalize, we are addressing the quality of the source authoring for many of our clients and are partnering with leading authoring tools developers, like Acrolinx, to improve the performance of our MT programs. The webinar generated some interesting discussions and many of the key points we made resonated with webinars attendees. If you have any questions about MT, PEMT or using source language checking software as part of your MT program, please feel free to drop me an email at Elaine.ocurran@welocalize.com.

I’d love to hear your feedback.

Elaine O’Curran, Program Manager on the Language Tools Team at Welocalize

VIEW THE WEBINAR
New Breakthrough with MT? The Secret is in the Source

 

Disruptive Changes in Software Localization and their Impact

By Loïc Dufresne de Virel, Localization Strategist at Intel Corporation

Loïc Dufresne de VirelIf you have been in the industry a long time, you probably remember the old way to estimate the cost of a software localization project running on Windows, ballpark $1/word, along with the usual discussions around content 90% complete, or even the notion of a UI freeze… You also know that today, this simplistic approach no longer applies as our environment has changed drastically. Here are a few thoughts about changes in software localization, and their potential impact.

Mobile and Agile

The wide adoption of agile methodologies for software development and the relative ease of deployment of software products and applications offering new features along with bug fixes have had a significant impact on the localization business. Time-to-market, the fast and constant need to quickly add new features to keep a growing user base engaged and entertained, seem to have taken precedence over “linguistic perfection.” We need to achieve an acceptable level of quality, certainly avoiding major mistakes (incorrect technical information, culturally offensive issues, severe mis-translation); however, it can no longer be justified to spend a few extra hours in translation or review in order to avoid three typos and add a missing period.

Meanwhile, mobile users who are looking for the latest snippets of useful data expect to find relevant content in their language, in an easy-to-consume format, and then they move on. Speed of execution needs to match the new and reduced “shelf-life” of the content, while minimum charges need to disappear in exchange maybe for a guaranteed minimum monthly volume. In this new world, the localization buyer becomes more interested in securing from his Language Service Provider (LSP) a minimal translation “bandwidth,” in terms of average monthly volume, with often a very aggressive SLA, and some assurances of quality, at a “fair” price of course.

Integration and Automation

For both the localization buyer and the LSP, this new normal requires more integration and automation than ever, which can only be achieved through a true partnership. Aggressive SLAs – we can perfectly envision a few hundred source words sent at 5:00 PM, with the expectation that translations into 20 or more languages will be delivered by 10:00 AM the next morning – can only be met using predefined rules and pre-established workflows. In this model, manual touch points need to be eliminated, or at least significantly reduced, on all sides, which requires planning, proper internationalization (a given that is sadly often overlooked), and a full integration between CMS and TMS. This becomes a major issue for the large localization buyers, as they often deal with multiple specialized CMS and need to deploy a complex infrastructure in order to facilitate an efficient localization flow; a never-ending challenge as the CMS environment keeps evolving rapidly.

For the buyer, continuous localization, resulting in a large increase in the number of smaller jobs they need to track, poses new problems, as it becomes quickly unbearable to spend even a few hours of administrative overhead (approval, reconciliation of quotes and invoices, payments, reporting, tracking of overdue jobs, eventual escalations) on each translation job, when the average cost of such job is barely $100. Human intervention needs to be limited to specific use cases, identified through advanced data analytics, while the majority of jobs flow automatically through a well-oiled machine. Are we all there today?  Certainly not, but enabling this model is the ultimate goal for most in-house localization teams.

User Experience

Why do we do all this? Because UX matters. There is no doubt that providing localized content to end-users is a big component of offering them a better user experience. While multiple studies show that non-English speaking Internet users spend more time on a translated page, even a machine-translated one than on an English one, the cost of providing content in local language is always part of an economic equation. Better analytics can allow us to make better investment decisions, thus controlling localization spending more efficiently.

IMG_7882We discussed the use of data and data analytics at the Welocalize LocLeaders Forum 2015 Silicon Valley and everyone was in agreement that key data points can be used to make localization decisions. Trained MT solutions can be used to lower the initial translation costs of a website while achieving quickly a good multilingual coverage, then a robust analysis of traffic data, coupled with some degree of A/B testing, could help refine the localization plan by investing a limited amount in post-editing or human translation of those pages that attract the most visits. Some companies use very sophisticated models to monetize web traffic, clicks, visits, etc., but at the end of the day, for most of us in the on-line or high-tech world, the big question is “how do we maximize the monetary value of a user over time?”

We are still running behind the elusive “cheaper, faster, and better” mantra. This time the best approach might not be to simply push the cost per word down by another cent. We need to focus on translating less, but more relevant, content, in a more efficient way, leveraging all available technology improvements (cloud, MT, integration, automation, predictive analysis) and reinvesting savings into supporting a broader and now well justified and prioritized set of languages, reaching a larger user base.

Localization buyers and LSPs need to adjust to this new, streamlined environment, and while certain aspects of their current organizations might become obsolete, new roles and responsibilities will emerge. And while we together move forward on the Localization-as-a-Service path, we also need to get ready for the next major transformation, where spoken interactions and conversational user interfaces will replace text, screens, and keyboard. The localization world is about to enter, in a way that is no longer limited to machine translation, the world of language models, NLP and algorithms.

Loïc

Loïc Dufresne de Virel, Localization Strategist at Intel Corporation

www.intel.com

1000px-Intel-logo

 

 

 

Welocalize to Present at 18th European Association for Machine Translation Conference

Frederick, Maryland – May 7, 2015 – Welocalize, global leader in innovative translation and localization solutions, will share industry insight and expertise at the 18th Annual Conference of the European Association for Machine Translation (EAMT) taking place in Antalya, Turkey, May 11-13, 2015, at the WOW Topkapi Palace.

“I am very excited to be taking part as an invited speaker at this year’s EAMT 2015 Conference in Turkey,” said Olga Beregovaya, VP of language tools and automation at Welocalize. “EAMT is an important international conference for the MT community. It is where experts, thought leaders and users of machine translation can meet and share research, findings and new tools to help their language technology strategy.”

Featured Welocalize presentations at the 18th Annual Conference of the European Association for Machine Translation:

  • Welocalize VP of Language Tools and Automation, Olga Beregovaya will deliver her keynote, “What We Want, What We Need, What We Absolutely Can’t Do Without – An Enterprise User’s Perspective on Machine Translation Technology and Stuff Around It” at 9:30 – 10:00am on Tuesday, May 12.
  • Olga Beregovaya along with Welocalize Senior Computational Linguist Dave Landan will be presenting “Streamlining Translation Workflows with Welocalize StyleScorer” as part of the poster project and product description session on Tuesday, May 12.

For more information about the EAMT 2015 conference, visit http://www.eamt2015.org.

About Welocalize – Welocalize, Inc., founded in 1997, offers innovative translation and localization solutions helping global brands to grow and reach audiences around the world in more than 157 languages. Our solutions include global localization management, translation, supply chain management, people sourcing, language services and automation tools including MT, testing and staffing solutions and enterprise translation management technologies. With over 600 employees worldwide, Welocalize maintains offices in the United States, United Kingdom, Germany, Ireland, Italy, Japan and China. www.welocalize.com

Global Marketing Highlights from the Marketing Nation Summit

marketo summit 2015Welocalize Global Marketer Lauren Southers recently attended the Marketo 2015 Marketing Nation Summit in San Francisco. The Summit is an annual event for marketing professionals, who drive global marketing campaigns and strategies. In this blog, Lauren shares the three main themes from the summit.

As a marketing and localization professional, attending the Marketo 2015 Marketing Nation Summit helped me further understand challenges faced by global marketers today and new techniques that global brands are using to increase sales and grow revenue through marketing strategies.

The Summit included high profile speakers, including Phil Fernandez, Marketo President & CEO, Arianna Huffington, founder of The Huffington Post and John Legend, nine-time Grammy and 2015 Oscar winner. All shared their experiences and inspirational stories.

The following summarizes three key global marketing themes highlighted throughout the Marketing Nation Summit:

1. Engagement Marketing

Marketo CEO Phil Fernandez opened the summit with a motivating keynote speech highlighting the topic of engagement.  He noted, We need to focus on marketing that is built on a real relationship with customers. We need to stop spending so much time as marketers talking and listen more.”

Phil spoke about how the fast pace digital changes will only continue. The way people are interacting has changed. We view and share more data than ever. Marketers need to move away from mass advertising, which is simply irritating our customers. We need to start having conversations with them on a personal level, reaching them everywhere they are located.

With the growth in digital marketing, this means our customers can be anywhere; therefore, localization must be part of the overall global marketing strategy. He also explained that email campaigns are becoming a thing of the past: engagement marketing is the future. We need to start listening to what our customers want and provide a personalized journey from start to finish. And personal means speaking to customers at a local level, in their language.

The shift to engagement marketing and technological advances completely changes marketing as we know it. According to Phil, marketers will not be able to recognize their jobs in years to come.

2. Inform, Inspire, Entertain and Empower

Best known as founder of The Huffington Post, Arianna Huffington, President and Editor in Chief of The Huffington Post Media Group, gave an inspiring opening keynote at the Summit, filled with anecdotes, lessons and nuggets of wisdom for marketers.

“We create content to inform and inspire; to entertain and empower,” declared Huffington. The Huffington Post created a site entirely focused on their audience and this is where marketing is now. We need to move increasingly into engaged marketing. “We recognize that it’s not enough to do just top-down presentations, we need to engage customers,” said Ariana.

One lesson was adding value to people’s lives: “by adding value to people’s lives, you can move from being useful to indispensable.” Arianna also spoke about how The Huffington Post continues to disrupt itself to deliver news to an audience that no longer wanted to just read news, but who wanted to consume news and share their own news.  By creating trust (“trust is the new black”), recognizing the world was changing and being able to deliver news that people are preoccupied with for personal consumption. The Huffington Post has managed to create a community of loyal readers all over the world. The Huffington Post wants to accommodate for all their readers, for example they created new sections such as the “divorce section” and to show they are not cynical, a wedding section followed. The online news site certainly embraces global audiences. There are 13 editions of the Huffington Post in 12 languages.

3. Collaboration! Teamwork! Inspiration!

John Legend wrapped up the mornings keynotes with a fabulous performance and sharing his journey to success. He spoke of his experiences and lessons he has learned: “To be great you have to study the greats. I studied Al Green, Stevie Wonder and Billie Holiday to name a few and they taught me what I needed to know to be great at song writing.”  

Legend stressed the importance of time, collaboration and inspiration. “Always be open to inspiration and schedule time for creativity.” He schedules song writing sessions and explained his reasons is to hold himself accountable for his time, forcing himself not to procrastinate.

As marketers, what we can take away from the closing keynote is to look at the people who inspire us, their successes and failures and apply those lessons to our own jobs. Finally, always be looking for inspiration, take time to brainstorm and collaborate with others, “It’s not always about structure, it’s about inspiration,” said John Legend.

It was inspiring listening to marketers and industry leaders talking about customer engagement at the Marketo summit. In my role as a global marketer in the localization industry, this summit continued to stress the importance of building relationships and engaging with all your audiences in all key languages and cultures.

We, as marketers, should always remember to inspire, empower and most importantly listen!

Lauren

Lauren.Southers@welocalize.com

Welocalize to Present at GALA 2015 Sevilla

Frederick, Maryland – March 18, 2015 – Welocalize, global leader in innovative translation and localization solutions, will share industry insights and expertise at the annual Globalization and Localization Association (GALA) Language of Business conference, taking place in Sevilla, Spain, March 22-25, 2015, at the Barceló Sevilla Renacimiento Hotel.

galaLaura Casanellas from Welocalize’s Language Tools Team will be presenting “Localizing for Travel: Diverse Solutions for Diverse Needs” on Monday, March 23 as part of a special conference forum designed to address the needs of the travel and tourism sector.

“The presentation at GALA 2015 Sevilla will discuss Welocalize’s localization and language approaches and processes specific to travel and hospitality,” said Laura Casanellas, machine translation and CAT tools program manager at Welocalize. “There are diverse localization models across the travel sector, from full transcreation to raw machine translation output for gisting purposes. Welocalize works with several global brand leaders and online travel companies, enabling us an opportunity to share our best practices at this year’s GALA Conference.”

“The GALA organization and events provide a great platform for the localization industry where we can network with our colleagues and collaborate with thought leaders,” said Jamie Glass, vice president of global marketing at Welocalize. “We are delighted to share our expertise at GALA 2015 Sevilla.”

GALA Language of Business conferences are gatherings for the translation and localization community, including providers of language services, managers of global content and language technology developers. Welocalize is a corporate sponsor and member of GALA.

About Welocalize – Welocalize, Inc., founded in 1997, offers innovative translation and localization solutions helping global brands to grow and reach audiences around the world in more than 157 languages. Our solutions include global localization management, translation, supply chain management, people sourcing, language services and automation tools including MT, testing and staffing solutions and enterprise translation management technologies. With over 600 employees worldwide, Welocalize maintains offices in the United States, United Kingdom, Germany, Italy, Ireland, Japan and China. www.welocalize.com

Press release:  http://www.marketwired.com/press-release/welocalize-to-present-at-gala-2015-sevilla-2001649.htm

Welocalize StyleScorer Helps MT and Linguistic Review Workflow

GettyImages_476511721Innovation is one of Welocalize’s four pillars which form the foundation of everything we do as a business. Clients and partners rely on our leadership to drive technological innovation in the localization industry. One of our latest innovative efforts is the soon-to-be-deployed language tool, Welocalize StyleScorer which will form part of the Welocalize weMT suite of linguistic and automation language tools. One of the driving forces behind StyleScorer is Dave Landan, computational linguist at Welocalize and a key player in many Welocalize MT programs.

In this blog, Dave shares the key components of StyleScorer and how style analysis tools can help the MT and linguistic review workflow.

At Welocalize, we are constantly looking for ways to improve the quality and efficiency of the translation process. Part of my job as a computational linguist is to create tools that help people spend less time on looking for potential problems and more time on fixing them. One of my team’s latest efforts in this area is StyleScorer.

Welocalize StyleScorer is currently in the early deployment testing phase. This tool will be deployed as part of the Welocalize weMT suite of language tools around linguistic analysis and process automation. I’d like to share some of the key components of StyleScorer and the role it will play in the MT and linguistic review workflow.

What is StyleScorer?

Welocalize StyleScorer is a tool that compares a single document to a set of two or more other documents and evaluates how closely they match in terms of writing style. The documents being compared must all be in the same language; however, there is no restriction on what that language is in the source content.

The main difference between StyleScorer and existing style analysis tools is that rather than summarize types of style differences (for example: “17 sentences with passive voice”), it takes a gestalt approach and gives each document a score anywhere between 0 and 4, with 0 being a very poor match to the style and 4 being a very good match.

To do this, StyleScorer uses statistical language modeling as well as innovations from NLP (natural language processing), forensic linguistics and neural networks (machine learning) in order to rate documents on how closely they match the style of an existing body of work. Because it learns from the documents it’s given, even if you don’t have a formal style guide, StyleScorer will still work as long as the training documents can be identified by a human as belonging to a cohesive group.

How does StyleScorer help the MT workflow?

While we think StyleScorer will be very useful as part of the linguistic review workflow for human translation, we are even more excited about how it can benefit the MT (machine translation) workflow at several points of the process both on source and target language documents.

One of the key components to training a successful MT system is starting with a sufficient amount of quality bilingual data. We are seeing more and more clients who are very interested in MT; however, they don’t have a lot of bilingual training data to get started. In the past, the only option available to those clients was a generic MT engine (similar to what you’d get off-the-shelf). This gets someone started in MT, though the quality of generic engines is generally lower than engines trained with documents that match the client’s domain and style.

We can use StyleScorer to filter open-source training data to find additional documents to train from that are closest to the client’s documents. High-scoring open-source data can then be used to augment the client’s training data, which allows us to build better quality MT engines for those clients early in the project life cycle.

If some documents are getting lower quality translations from MT than others, we can use StyleScorer as a sanity check as to whether the source document being translated matches the style of the client’s other documents in the same language and domain. An engine trained exclusively on user manuals probably won’t do well on translating marketing materials. StyleScorer gives us a way to look for those anomalies automatically.

We are particularly excited about using StyleScorer on target language documents to help streamline workflows. If we run StyleScorer on raw MT output, we can use the scores to rank which documents are likely to need more PE (post-editing) effort to bring them in line with the style of known target documents. This is particularly useful for clients with limited budgets for PE and clients with projects that require extremely fast turnaround because it allows us to focus PE work where it is needed the most.

Finally, we envision StyleScorer becoming part of the QA & linguistic review process by spot-checking post-edited and/or human translated documents against existing target language documents. Translations that receive lower scores may need to be double-checked by a linguist to make sure the translations adhere to established style guides. If it turns out that low-scoring translations pass linguistic review, we use them to update the StyleScorer training set for the client’s next batch of documents.

Dave

david.landan@welocalize.com

Based in Portland, Oregon, Dave Landan is a Senior Computational Linguist for Welocalize’s MT and language tools team.

How to Localize Global Marketing – Videojet Case Study

VIDEOJET 300DPI LOGO pieni-1For any global organization, implementing a standardized marketing strategy, in one source language, assumes that everyone who touches your product or service speaks the same language and has the same cultural approach. With so much digital content, published and distributed online, today’s marketers have to carefully consider their marketing strategies and integrate localization in the overall global marketing plan.

The world’s largest coding and marking company, Videojet, has a global workforce and a large distribution partner network dispersed across 26 countries. They need to continuously communicate and roll out global marketing campaigns, to local markets. This is a high priority for the organization. Digital and printed branded materials are produced to promote and educate Videojet’s communities about their wide product range as part of the overall globalization strategy.

Part of their globalization strategy was to produce marketing collateral in over 17 languages. Videojet chose Welocalize as their partner to handle the localization of all branded marketing content. Welocalize teams work with Videojet to develop and implement a localization strategy to support local product launches. This includes localizing global marketing materials like email campaigns, white papers, multimedia and much more.

The results thus far are a 50% increase in translation volume and 35% saving on overall translation spend due to improved process and translation memories. This has been achieved by maturing and centralizing the localization process, automating the translation workflow and utilizing key technologies to help all teams maximize productivity.

With Welocalize, we have been on a real journey to improve and increase Videojet’s level of localization maturity and get to a place where we’re producing high quality, global marketing materials to support Videojet’s international business strategy,” said John Coleman, Marketing Director, Videojet.

You can find out more about how Welocalize helped Videojet to localize the global marketing strategy by reading the full case study. Click here to read: videojet case study – welocalize

A Refresh on MT Post-Editing

galaThe Globalization and Localization Association (GALA) recently asked machine translation expert Olga Beregovaya, Vice President of Language Tools and Automation at Welocalize, to be the organization’s GALAxy Guest Editor. In the GALAxy Newsletter Q4 2014, Olga provides a fresh perspective on a number of MT trends and hot topics in the feature, Letter from the Guest Editor: MT Post-editing — A Fresh Perspective.

Why did GALA choose Olga to edit this issue?  Olga is a regarded language services advisor who works with multinational organizations in MT and post-editing strategies and implementations.  As the Guest Editor, she was charged with the task of selecting the most relevant topics and contributors, while working closely with the GALAxy editorial team to produce a high-impact edition of the popular newsletter for Q4 2014. The latest issue shines light on the current trends, opportunities and challenges in MT post-editing, as well as the impact it has on the future of the translation and localization industry.

Olga Beregovaya“When I was offered the role of Guest Editor for this issue of GALAxy Newsletter, I knew immediately who I would want to reach out to for their insights and what aspects of this exciting new field I would want the issue to cover,” said Olga Beregovaya. “The process was a great experience and the GALA editorial team are fantastic. I hope the readers get as much out of this issue of the GALAxy newsletter as I have in my role as editor.”

Here’s a quick summary of the lead articles and authors that were included in the publication:

If you are considering machine translation or would like to talk about any of the topics raised in the GALAxy newsletter, reach out to Olga at Olga.Beregovaya@welocalize.com.

For information about Welocalize’s weMT solutions, click here.

Welocalize is a member of GALA.

MT and Translator Speed: A Welocalize Interview with John Moran

Interview by Louise Law, Welocalize Communications Manager

john_moranI recently met with John Moran, an experienced translator and programmer who is working on a PhD in Computer Science. John has worked closely with Welocalize and CNGL (The Centre for Global Intelligent Content) for many years. In 2011, Welocalize began its partnership with the Irish-based academia-industry body. Very shortly after Welocalize joined CNGL, conversations began between John Moran and Dave Clarke, Welocalize Principal Engineer.

John’s research idea was to gather real-time data from translators post-editing MT output compared with translating “from scratch” using an instrumented CAT tool that records how a translation is produced rather than just the final translation. This work has resulted in a joint development effort with Welocalize contributing its developments to the code base and the commercial licensing of the iOmegaT Translator Productivity Workbench from Trinity College Dublin. iOmegaT measures the impact of MT on translator speed cheaply and accurately in a professional grade CAT tool. You can read more about in the March release when Welocalize announced the licensing of the iOmegaT technology in collaboration with CNGL. I caught up with John to find out his latest thoughts on MT and ask him how the iOmegaT project is progressing.

How long have you worked with Welocalize?

I worked in-house at the Welocalize Dublin office for nearly a year from 2011 to 2012, around the time Welocalize began their collaboration with CNGL. Since then Christian Saam, the second member of the iOmegaT team in CNGL,  and I have been working with the Welocalize MT team and HP to test and refine the workbench. I have about ten years of commercial application development behind me so I am used to seeing software evolve; however, it is particularly satisfying when you can take something from proof of concept to a commercially viable solution. It’s definitely fair to say that this would not have been possible without the Welocalize team. I had touted the idea to a few translation companies and Dave Clarke at Welocalize spotted its potential right away. His engineering expertise complimented my own very well and Welocalize had already had a very advanced MT program when I came on the scene so there was post-editing work to test it on.

Can you tell me about your PhD work?

452245957The problem I am trying to solve is how to accurately measure the impact of MT on a translator’s working speed using a technique we call Segment Level A/B testing. I had the idea for iOmegaT after I used MT for one of my own translation clients in OmegaT, a free open-source CAT tool I use whenever I can instead of Trados. At the end I could not tell if MT had helped me in terms of working speed, as I was so caught up in the translation itself. I suspected it had, as I was able to use a few sentences without changing them and the MT gave me a few ideas for terminology that might have taken me a few seconds longer to think of without it. I wanted hard data to support that intuition. Removing the MT from random sentences and measuring the speed ratio seemed like a good way of doing the measurement.

In order to do this, I adapted OmegaT to log user activity data as the translator translates some randomly chosen sentences from scratch (A) and post-edit other sentences (B). We call translation-from-scratch HT, shorthand for human translation.

This data is later analyzed to generate something we call a HT/MT SLAB score as “Human Translation versus MT Post-edit Segment Level A/B” score is a bit of a mouthful. For example a +54% HT/MT SLAB score indicates a particular translator was 54% faster using MT on a particular project. We also take the time a translator spends doing self-review into account. The system we developed to calculate this score is called iOmegaT. The “i” stands for instrumented. Others had thought of doing that using minimally functional web-applications; however, we were the first to do it in a professional grade CAT tool.

What do you think are the main barriers and challenges for companies looking to use MT?

I think the main barrier is that about three quarters of translators (in Europe at least) are freelancers and the vast majority use CAT tools like Trados, MemoQ and Wordfast. These CAT tools don’t report on how MT impacts on post-editing speed, so it makes it very hard to negotiate fair discounts when the translator is not working in-house.

One of the things we found by giving the same files to post-edit to different translators is that the MT utility can vary from person to person. You need to have some way of identifying people who can work well with MT. Edit distance does not really capture that directly. Because not everyone finds MT equally useful, as an agency you can very quickly find yourself in a situation where the discount you are asking for is unfair. Basically, the lack of hard data on speed ratios can lead to mistrust on both sides. We think this is a problem that can be reasonably and easily solved with SLAB scores.

Can you tell us about the iOmegaT project and where it’s heading?

Aside from some refinements, I think we are where we want to be in terms of measuring the impact of full-sentence MT on translation speed. On the research side, what we want to do next is look at the impact of automatic speech recognition using Dragon Naturally Speaking and various forms of predictive typing and auto-complete on productivity. On the commercial side, one really exciting development is that OmegaT has now been integrated with Welocalize’s GlobalSight, which is also free and open-source. That means you don’t need special workflows for productivity testing, so the testing process is much cheaper. This means we can gather speed data for longer periods to look at more gradual effects, like the impact of MT and/or speech recognition technology on translation speed in terms of words-per-hour over weeks and months.

What’s next on the horizon for CNGL and iOmegaT?

Right now, the iOmegaT CAT tool and the utilities and analysis software that go with it require a good deal of technical ability to use. For that reason we currently only engage with one new client at a time. Our focus has been on well-known corporate or enterprise end-buyers of translation with complex integration requirements, such as with SDL TMS and SDL WorldServer. This has worked well so our next aim is to develop the system into a suite that is easier to use to widen the user-base to smaller LSPs and even translators, while listening to our small core of enterprise clients to improve the software for them too.

Where do you think MT is heading in the future?

One problem is the fact that research systems in MT are being evaluated on the basis of automated evaluation metrics like BLEU. However, we know that small improvements in these scores mean little in terms of a translator’s working speed. It is an elephant in the room. There is some hope. If the desktop-based CAT tools like Trados, MemoQ, Wordfast can take a page out of our book and implement iOmegaT’s Segment Level A/B testing technique, I think at least some user activity data could be shunted into research.

Researchers could collaborate with existing MT providers, who are already closely linked to publicly funded MT research. This might facilitate a tighter development and testing loop between translators who are using MT and researchers developing better MT systems for different languages and content types.

Also, we don’t have to limit ourselves to full-sentence MT. The testing technique behind SLAB scores can work just as well for other technologies like predictive typing, interactive MT, full-sentence MT and automatic speech recognition. It is going to be interesting to see how these technologies interact with each other. I think speech and MT are particularly well suited to benefit from each other, as I would like to see more industrially focused research done on that topic. Dictation using Dragon can have health benefits for translators by reducing the risk of repetitive strain injury so it is important to shine a light on it to justify more research, even if it doesn’t really bring down the cost of translation for end buyers in the near term.

What’s one piece of advice you would offer to a global brand looking to deploy an MT program?

Don’t always believe what MT providers say about productivity improvements. Figure out how to test the impact of MT on translators first using cheaper systems like Microsoft Translator Hub and then work out how to improve on that MT baseline. Whether you use in-house translators who are closely monitored, TAUS’s DQF tools, iOmegaT or other productivity testing tools like PET, the important thing is to be able to accurately measure that which you wish to improve so you can see the impact of small changes. In software engineering, we call this test-first development. iOmegaT and Segment Level A/B testing makes that testing process cheaper so you can do it more, or, indeed, all the time.

About John Moran: Since the 90’s, John has worked variously as a lecturer in translation at Trinity College Dublin, as a translator in his own LSP and as a consultant software engineer for leading global companies like Cap Gemini, Siemens and Telefonica. He is currently writing a PhD in Computer Science on the topic of CAT tool instrumentation. You can find him on LinkedIn at https://www.linkedin.com/profile/view?id=18681141.

Find out more about weMT here.

For more information on iOmegaT, Check out Dave Clarke’s blog, Welocalize, CNGL and iOmegaT: Measuring the Impact of MT on Translator Speed.

All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them. 

 

EAMT Conference 2014: Welocalize Language Tools Team Overview

Laura CasanellasThe Welocalize Language Tools team attended and presented at the 2014 EAMT Conference in Croatia. In this blog, Laura Casanellas, Welocalize Language Tools Program Manager and presenter at EAMT, provides her highlights and insights from her Welocalize colleagues who took part in the conference.

Just like Trento in 2012 and Nice in 2013, the Welocalize Language Tools Team participated in the Annual Conference of the European Association for Machine Translation (EAMT). The conference took place June 16 – 18 in the city of Dubrovnik, Croatia and four members of the Welocalize Language Tools team attended:

Olga Beregovaya, VP of Language Tools, and Dave Landan, Pre-sales Support Engineer, presented a project poster on “Source Content Analysis and Training Data Selection Impact on an MT-driven Program Design with a Leading LSP.”

Lena Marg, Training Manager and I delivered our presentation “Assumptions, Expectations and Outliers in Post-Editing.”

We take the EAMT conference and associated conferences (International, Asian and American) seriously, as most of the important developments that are currently taking place around machine translation (MT) are presented and followed up in those forums.

As a global language services provider (LSP), Welocalize adds value to the EAMT conference by being able to share real-life MT production experiences, demonstrated through thorough analysis of large and varied quantities of actual data. We are privileged in that we work in a real scenario where some of the new technologies around natural language processing (NLP) and MT can be tested in depth.

EAMT 2014 Poster Presentation WelocalizeIn their poster: Source Content Analysis + Training Data Selection Impact – EAMT POSTER by Welocalize, Olga and Dave stressed the idea of the importance of preparing the training corpus in advance and matching it to the specifics requirements of the content that subsequently will be translated. To give an example, many translation memories come from different projects created at different points in time. They may contain inconsistencies or the sentences in these translation memories can simply be too long or may contain a lot of “noisy” data. They need to be cleaned up before they can be used as engine training assets. Going deeper into the possibilities of automatic data selection and matching it with the source content, Olga and Dave spoke about our suite of analytic applications, divided between proprietary tools like Candidate Scorer, Perplexity Evaluator, StyleScorer and others that are being developed as part of an industry partnership with CNGL: Source Content Profiler and TMT Prime.

Olga Beregovaya’s impressions about the EAMT Conference and Welocalize’s role within it are very positive. “Overall, the great thing about the conference was the applicability of the new generation of academic research in real live production scenarios. Many of the academic talks were relevant for the work on MT adaptation and customization that we do at Welocalize. Today, we need to cover more and more domains and content types so the domain and sub-domain adaptation is becoming the key area of our R&D. This means that we benefit greatly from academic and field research around data acquisition for training SMT systems and the relatively new developments around using terminology databases to augment the SMT training data. Not all of our clients come to us with their legacy translation memories, and while there is some public corpora available, we still need to rely on us acquiring and aligning data ourselves.”

Dave found two presentations he attended particularly interesting that focused on common pain points within the industry. “The challenges of using MT with morphologically rich languages are well-known, and we were happy to see interesting research in possible ways to overcome those challenges. We also found a talk on gathering training data from the web very interesting. The presenters discussed using general and specific data to train separate engines which could be weighted and combined to give improved results in cases of sparse in-domain training data. Indeed there were several innovations from academia that we are looking forward to incorporating into our bleeding-edge MT tools and processes.”

In our presentation, Lena and I focused on different challenges in a real MT production scenario: the necessity of forecasting future post-editing effort, with an emphasis on post-editors behavior, and their personal and cultural circumstances, as an important variable of the MT + PE equation. As part of a large LSP, we have been able to gather large amount of data and focus on the quality of a number of MT outputs related to different languages and content types. Our presentation elaborated on our findings around correlations between different types of evaluation methods (automatic scoring, human evaluations and productivity tests). We obtained interesting findings around the adequacy score in our human evaluation tests and the productivity gains contained on the post-editing effort. We will continue gathering data and investigating around this area.

Another topic that was touched upon during the conference was the area of quality. Lena and Olga both shared their perspectives:

“After closely following the QTLaunchpad project for several months, it was particularly interesting to see and discuss results from their error annotation exercises using MQM earlier in the year. Welocalize took part in these exercises by providing data and annotator resources. The findings of this exercise are contributing to further advances both in quality estimation and quality evaluation, fine-tuning metrics further for better inner-annotator agreement, etc. These discussions also provided some immediate take-aways for our approach to evaluation.” – Lena Marg

“The other area of high relevance to us is Quality Evaluation. Again, it is great to see so many research projects dealing with predicting MT quality and utility. While it still may be challenging to deploy such quality estimation systems in-production as various CAT tools and TMS systems have their own constraints around metadata-driven workflows, it is very encouraging to know that this research is available.” – Olga Beregovaya

“A general theme of the EAMT Conference was the question of how to increase cooperation between the translation and the MT research community. In this context, Jost Zetsche’s keynote speech was important in pointing out that translators should take an active interest in providing constructive feedback on MT and on how they work, to ensure new advances in MT developments are truly benefiting them. And yet, with the presence of some interested freelance translators, translation study researchers and a handful of LSPs presenting on MT, it would seem that progress has already been made in bringing the two sides together.” – Lena Marg

Stradun la nuit_dubrovnikThe EAMT Conference was a great opportunity to meet professionals, academics and researchers who work in the field of MT. The Welocalize team members were able to exchange ideas around the current pressing challenges surrounding MT technology and we still had time to admire the beautiful surroundings of historical Dubrovnik.

Laura
Laura.casanellas@welocalize.com
Laura Casanellas is program manager on the Welocalize Language Tools team.

Evolving MT: McAfee’s Journey into MT

MOB_ProfileMorgan O’Brien from Intel Security (McAfee Division) took part as a guest panelist on the “Evolving MT” discussion at the Welocalize LocLeaders event in Dublin. Morgan talked about McAfee’s journey to machine translation (MT) and in this blog, summarizes some of his key points.

Everyone has to start from somewhere. That usually means starting from nothing. A few years ago, we looked at MT and had to make some decisions. We had to claw our way through the hype and the bold statements that many providers make at conferences and make some choices for ourselves. The journey was all about understanding what was out there in terms of MT offerings, understanding our own content and also understanding what our internal use cases were for MT. So here is some wisdom from that journey that may help others.

Acceptance of MT
You don’t just wake up one morning and decide that you’re going to use MT and that everyone will accept that. It starts by making the business case and it grows from there. We started with a very simple question…“Would MT be better than Pseudo builds for QA testing?” Some free API’s to Microsoft and Google at the time had limits to the amount that you could MT with those API’s. We needed something else, low cost, to get us off the ground. This manifested itself with a copy of Systran. It came with 11 language pairs and gave us the ability to test out theories. Perfect. You build up the reputation of MT within your organization from nowhere and let acceptance follow from the positive results.

After Pseudo…
Having used MT with some Pseudo test projects, we enhanced the quality by training in our own terminology and UI information. Very quick and dirty but gave the accuracy to the Pseudo translations that we needed. Not linguistic accuracy, but accurate enough for the MT Pseudo builds to be useful. Now that we had used MT, the next steps were to start looking at other uses and providers, filtering through the marketing material and sales pitches. There is no quick answer for this unfortunately. Only you can make the best decisions for your organization. With every bit you learn, you refocus back on your internal challenge and see how it fits. Eventually you will take the plunge with some pilot and proof of concept projects.

Documentation Localization with Post-Edited MT (PEMT)
We completed a number of tests before we started looking at our documentation. We profiled our content. We’d seen similarities in some content types and the terminology used. We also had a large scale terminology management project running in parallel. We had also made ground with our documentation authors who were starting to look at controlled language and terminology. Conditions were good for documentation to go to MT. It’s important to understand that MT quality is affected by several conditions.

Training Corpora for MT
A minimum amount of translation units is needed to train an effective statistical machine translation (SMT) system. If you have more content than is required, profile it and select the best content for training of your MT systems. Profiling content is extremely important to SMT. Our training corpora was selected based on this profiling.

I’ll admit, some was trial and error and some we employed tools to help us on the way. We kept a keen eye on BLEU* scores, as this is all you can do on at this initial stage. Later on, BLEU becomes more irrelevant.

In addition to having the right training corpora collected and cleaned, we also had a side-line in from our terminology management which allowed us to weight the most important terms from our products in our MT output. As part of compliance in the Language Quality Analysis, this was a big deal for us. If we want to post-edit our MT output, then the least we should do is ensure that the terminology suggested is correct for the Post Editor as this significantly increases the quality and speed of post-editing.

Quality Levels:PEMT with Gold Standard
At McAfee, we talk about the “gold standard” with regards to our output, because we don’t want to compromise our language and ultimately the localization message. It’s important to our global brand. Gold Standard means that there is compliance to our terminology, the style and accuracy of message is good and the fluency of the text flows well for the reader. It is our highest quality and what we expect from our translators during a normal human translation workflow. This was the aim for our PEMT.

Content Optimization (Acrolinx)
We don’t have our source content optimized at the moment for MT. Why? Because it takes years of writing and localization in order to have a MT training corpora that matches the authoring process. This is not to say that the content was not good, but it was not optimized in terms of standards which could significantly help MT. Starting at this point, where the authoring is now being written with structure and common standards across the organisation, means that with retraining future MT, we should be seeing better gains. Simple rules for authoring, such as trying to keep sentence length to 12 words, can make a huge impact on your MT effectiveness. And of course, more consistent terminology usage helps hugely.

Translator Productivity Expectations

Ultimately your MT must be tested with productivity. BLEU gives you an indication of quality; however, real quality estimation is based on how much better your translation flow is after MT is introduced. This is calculated by two old favorites: time and money. For effective MT, reducing costs and increasing productivity, you need the following:

  • Good selection of MT training corpora (usually ~300K segs bilingual data)
  • A process for cleaning, selecting and organizing (engineering and linguistic)
  • Terminology consistency (a set process and buy-in to that process)
  • Style and rules consistency in authoring (authoring standards)
  • Expected quality levels (full PE or light PE?)
  • The human factor understanding
  • Fit-for-purpose tool set

So, how do you calculate the time and money? This is where tools and the human factor come in.

Tools
You can use tools such as TAUS DQF or more integrated tools like iOmegaT or MemoQ for calculating time on segments. TAUS DQF is a good start to assess your general productivity; however, for an actual production project, it does not allow the Post Editor to use the tool and set of macros and shortcuts that they a familiar with. I’d like to see all translator CAT tools in the future have productivity data built in for MT. This may cross some trust boundaries with translators, As long as it has an ‘opt in’ system, I believe that most post-editors will be happy enough to share some data to help you improve your MT.

The Human Factor
The human factor is, very simply, the fact that two people side-to-side will perform differently on any type of work. It’s part skill, part familiarity with tools, part experience and part acceptance.

A post editor with little experience of post-editing and lots in traditional translation will simply not perform as well in a productivity assessment as someone with post-editing experience. Post-editing experience is at least understanding patterns, quick work practice methods and familiarity with the tools they use to implement these methods.

One influencing factor is how much they (post editors) accept the process with MT and want to work with it. If they don’t like it, they simply will not perform well. It’s our challenge, as providers of the MT and the process, to bring the end post editors closer and give them good MT output that evolves and gets better the more they are working on it. Don’t allow the post editors to suffer from repetitive, stupid mistakes from the MT. Make them part of the MT activity workflow and provide rewards by keeping MT quality high.

The MT Feedback Loop
There is currently nothing out there that automates the MT feedback loop other than retraining data. I would like to see CAT tools create a standard for reporting back on MT, through Xliff kits or similar, so that when a kit is delivered, there’s a part for the MT specialist to dive into. If this is possible, the translator that bothered to give feedback should be rewarded for doing so, as they are helping the MT improve.

The Application of Raw MT
There are cases for use of raw MT; however, you have to be careful. Using RAW MT on an area of information, where the message could be misrepresented, could damage your brand and your reputation. For each use case for raw MT, it would be prudent to at least do a usability study every now and again.

Most of the MT systems out there that offer on-demand translation are pretty decent, especially in the IT domain. But there is always a chance of introducing total language blunders into your output. This may have a humorous side (“Windows ME” translated as “Fenêtre Moi” in French!), most of the time they are unprofessional and give the end user the wrong message. Too many embarassing translations and you really should not be using raw MT on what may be high impact content. This makes the case for ‘raw MT’ to actually be ‘Refined MT’. It’s something you can take feedback from various studies to improve the usability of the content but this requires your own engines, as apposed to the free services out there.

LocLeaders MT Olga and Morgan

Preparing for the Evolving MT panel at LocLeaders

In summary
Start small. Understand what you want as an achievable goal of quality and where you’re going to use the MT in line with your overall business goals. Understand your content and any issues it may have. Use the right tools. And assess the quality in a variety of different ways, depending on what the content type is and the varying levels of impact that content will have on your global brand.

Morgan

Morgan O’Brien is project manager at Intel Security (McAfee Division)

*BLEU: Bilingual Evaluation Understudy. An algorithm for evaluating the quality of text which has been machine translated.

 

 

 

Welocalize to Present at 17th Annual European Association for Machine Translation Conference

FREDERICK, MD – June 16, 2014 – Welocalize, global leader in innovative translation and localization solutions, will share industry insight and expertise at the 17th Annual Conference of the European Association for Machine Translation (EAMT) taking place in Dubrovnik, Croatia, June 16 – 18, 2014.

Senior members of the Welocalize Language Tools Team will be taking part in a number of presentations and discussions related to machine translation (MT) and automation at this year’s EAMT conference.

“Welocalize is excited to participate at this year’s EAMT 2014 Conference in Dubrovnik,” said Olga Beregovaya, VP of language tools and automation at Welocalize. “As more content is created every day, the demands for language services related to machine translation deployments is growing exponentially. EAMT is an important international conference where thought leaders and experts in machine translation can collaborate through shared research and innovations to advance our industry and meet the escalating demands.”

Featured Welocalize presentations at EAMT 2014:

For more information about the EAMT 2014 conference, visit http://hnk.ffzg.hr/eamt2014/papers.html and to find out more general information about EAMT, visit http://www.eamt.org.

About Welocalize – Welocalize, Inc., founded in 1997, offers innovative translation and localization solutions helping global brands to grow and reach audiences around the world in more than 125 languages. Our solutions include global localization management, translation, supply chain management, people sourcing, language services and automation tools including MT, testing and staffing solutions and enterprise translation management technologies. With over 600 employees worldwide, Welocalize maintains offices in the United States, UK, Germany, Ireland, Japan and China. www.welocalize.com

Welocalize Office Exchange Program from Portland to Dublin

david landan welocalizeDave Landan is a pre-sales support engineer for Welocalize’s machine translation (MT) and language tools team and is based in Portland, Oregon. He recently spent a week in Dublin as part of the Welocalize Office Exchange Program. He shares his experience of Ireland and recaps his journey.

I’m a pre-sales support engineer on Welocalize’s MT and Language Tools team. I spend a lot of my time on MT pre-sales with external prospective clients and support our existing clients MT programs. Some of my time is spent supporting internal clients and continually working to make MT better for everyone involved.  I work from my home office in my garage near Portland, Oregon, with occasional visits to the Welocalize Portland office or down to California for meetings with clients or prospects.

I love traveling – new food, new people, and new sights. So, I jumped at the opportunity to participate in the Welocalize Office Exchange Program and visit our Welocalize Dublin office.  In addition to meeting several of my colleagues for the first time, the main purpose of my Dublin visit was for me to get up-to-speed on an exciting project that springs from our partnership with the Centre for Next Generation Localisation (CNGL).  CNGL is a collaborative academia-industry research center combining the expertise of researchers at Trinity College Dublin, Dublin City University, University College Dublin, and University of Limerick with localization industry partners. Welocalize is one of CNGL’s key industry partners and has worked with them for over three years now.

I left Portland on a Friday morning and touched down in Dublin about 13 hours later on Saturday morning.  Since I did not have any meetings scheduled until Monday, I rented a car and headed west.  Ireland is a beautiful country – lush, green, and very pastoral.  There’s so much history that I could have stopped every 5 km to see a centuries-old castle, abbey, monastery, or pub.  Instead, I put 600 km of touring on the car before passing out in my room. The next day, I enjoyed a hearty Irish Sunday breakfast of eggs, bacon, sausages, black and white puddings, grilled tomatoes, toast, and coffee. I was ready for anything!

The bulk of my week in Dublin was incredibly productive with meetings, demonstrations and presentations.  I spent two days in our Dublin office, meeting with colleagues and two and a half at Dublin City University meeting with some of the CNGL folks.  Meeting my colleagues in the Welocalize office was great. It gave us a chance to dive into details that we would not normally have the time for during the regular work week.  I was also able to get a good sense of what a “day in the life” is like for the people who I support. I will try to translate that into better tools and processes for my internal work.

olgab_lenaMdavelandan_lauracOlga Beregovaya, Lena Marg, and Alex Yanishevsky were in Dublin as well that week. We all got together with the Dublin MT and Language Tools folks for a lovely dinner in Dun Laoghaire.

As for the CNGL work that I took part in during my visit, I can’t tell you much about that just yet.  Suffice to say there’s some exciting and unique work in the world of weMT that should come to light soon. Watch this space for updates as they are available.

We managed to accomplish all of the goals we had set before the trip with a half day to spare. On Friday afternoon, I managed to visit several of the Dublin sights that I missed in my first tour.  In all, the trip was both fun and productive and I can’t wait to go back.

Cheers,
-dave
david.landan@welocalize.com

Welocalize at memoQfest Americas Discusses MT and Better Translations

memoqfest 2014The annual translation technology conference, memoQfest Americas, took place in Los Angeles last week. David Landan from Welocalize’s Language Tools Team was invited to present about MT at the conference. In this blog, he shares his experience.

Presenting at memoQfest Americas 2014 was an important event for me in several ways. Not only was it my first time attending a memoQfest conference, it was also my first time representing Welocalize at a conference. The icing on the cake was being asked to give a talk about MT at the event.

Public speaking isn’t my strong suit. (Unless it’s about wine, but that’s another story). I live near Portland, Oregon (rain, anyone?), so when when I was asked to spend a few days in sunny Los Angeles in February, I didn’t need to think twice. I put my nervousness aside, prepared a talk, and packed my bags.

MemoQfest was a unique experience. Kilgray Translation Technologies is a fast-growing translation technology company that makes memoQ, an advanced translation environment for translators and reviewers. The company has been putting on the event for several years in their home country of Hungary. For the past few years, they have also hosted an annual event in the US. While many company-sponsored conferences are free to attend and used as an opportunity to sell product or gain exposure, memoQfest attendees pay to attend and most are die-hard memoQ users. Attendees are primarily translators and project managers (PMs), with a few executives, salespeople and tech support folks.

Kilgray uses the event for education, and it is a way for them to both offer workshops on current versions of their products and to announce what is in the works for the next release. What is most notable and refreshing is that the Kilgray folks court criticism at the event. They genuinely want to make their users happy and they take the criticism and feature suggestions seriously. This year’s upcoming release includes fixes and new features suggested at last year’s memoQfest conferences.

Machine translation (MT) is a big, exciting topic in the localization industry. MT was represented in the presentations (mine included) and in the discussions that were happening outside of the scheduled events. I presented a rather technical talk about Welocalize’s work in improving localization throughput by using a set of analytical tools to make MT better. Click here: Better translations through automated source and post-edit analysis, to view the slides from my presentation.

One thing that surprised me was how many translators use generic MT (like Google or Bing) in their day-to-day work. The thing that people need to understand is that computers are dumb. If I ask you what word comes next in the sentence “I need to pick up a dozen eggs and some milk from the …” you’d probably guess something like “store” or “market”. In statistical natural language processing, if your training data includes the phrase “milk from the” followed by “cow” often, then the system will think that “I need to pick up a dozen eggs and milk from the cow” is a perfectly reasonable sentence, because it’s the one with the best probability given the data that was used to train it.

MT output is only as good as the data used to trained the engine. With large generic MT engines, the training data is very noisy. In fact, some of the training data that’s automatically scraped ends up being someone else’s unedited bad MT output. Garbage in, garbage out as they say.  Not to say that everything you get from generic MT is garbage. Google and Bing do reasonably well for high-resource languages in general domains. If you need professional quality work, you need a professional quality MT engine. To get a professional quality MT engine, you need good data and you need to use translators to post-edit the MT output, depending on what quality levels are required.

What we have developed within the MT and Language Tools team at Welocalize is a way to identify good, clean data so you start with a better engine. We don’t stop there — our tools can identify trouble spots in MT output and we have tools and processes for post-editing that provide a feedback loop to keep improving on every project. Exciting stuff, right?

Now, if only I hadn’t brought the rain with me from Portland to LA.

Cheers,

Dave

Email me at david.landan@welocalize.com

Welocalize to Present in LA at Translation Technology Conference memoQfest Americas

memoqfest 2014Fredrick, Maryland – February 25, 2014 – Welocalize, a global leader in translation and localization will be sharing machine translation (MT) knowledge and expertise at the 2014 memoQfest Americas conference in Los Angeles, February 27 through March 1.

David Landan from Welocalize’s Language Tools Team will be presenting “Better translations through automated source and post-edit analysis” on day two of the conference.

My presentation at memoQfest Americas will discuss how Welocalize is developing processes and tools grounded in computational linguistics and NLP to reduce post-editing effort,” said David Landan, support engineer at Welocalize. “We analyze data using techniques from machine learning, language modeling, and information retrieval.  Our data-driven approach allows us to build more targeted, more accurate MT systems.”

David will explore ways of automating training data selection using a source content analysis suite and show how the selected data led to improved MT engine quality by using Welocalize’s WeScore and StyleScorer as a way to evaluate translations. Welocalize’s WeScore is a dashboard for viewing several metrics in a single application. It makes automatic scoring of MT output easier by handling input parsing formats, tokenization, and running multiple scoring algorithms in parallel.

Machine translation (MT) is a topic with a high level of interest at localization and translation industry events. As global organizations produce more and more content and the demands for quick localization grow, Welocalize will highlight how combining translation approaches, like MT and post-edit analysis, can achieve the desired quality of output that meets time and budget goals.

memoQfest is an annual conference, hosted by Kilgray Translation Technologies, to learn more about trends within the translation technology industry. The memoQfest event also provides networking opportunities for translators, language service providers and translation end-users.

About Welocalize – Welocalize, Inc., founded in 1997, offers innovative translation and localization solutions helping global brands to grow and reach audiences around the world in more than 125 languages. Our solutions include global localization management, translation, supply chain management, people sourcing, language services and automation tools including MT, testing and staffing solutions and enterprise translation management technologies. With over 600 employees worldwide, Welocalize maintains offices in the United States, UK, Germany, Ireland, Japan and China.www.welocalize.com

Welocalize and Intuit at TAUS 2013: From Zero to MT Deployment

Alex Yanishevsky 2013Welocalize Senior Solutions Architect, Alex Yanishevsky, delivered a joint presentation with Render Chiu, Group Manager, Global Content & Localization from Intuit, at the recent TAUS annual conference in Portland, Oregon.

In their presentation How STE and Analytical Tools Enabled MT Program”, Alex and Render shared valuable insights about the Welocalize-Intuit machine translation (MT) program. They specifically detailed experiences and best practices in going from zero to MT deployment across 11 languages in a short 90 days.

Their TAUS presentation focused on the role of language tools and analytics in meeting a global organizations need for fast product expansion with localized solutions. Alex and Render presented how Welocalize and Intuit leveraged publicly available data to train an initial set of MT engines and build a business case to go into production with MT.

Alex Yanishevsky shares his five key highlights from the presentation:

  • The Welocalize and Intuit MT program was deployed in 3 months for 11 languages
  • We trained Microsoft Translator with significant improvement over baseline engines on very sparse bilingual data
  • Intuit’s adherence to Simplified Technical English made MT onboarding much easier
  • Welocalize-specific analytics, part of our secret sauce,  along with POS Candidate Scorer, Perplexity Evaluator and Tag Density Calculator where used to analyze source content suitability
  • We used weScore, part of Welocalize weMT framework, to calculate analytics on MT engine quality, such as auto- scoring, human evaluation, productivity metrics

The full TAUS presentation is available to view here: How STE and Analytical Tools Enabled Intuit MT Program Welocalize TAUS 2013

You can also view Welocalize Olga Beregovaya’s presentation at the TAUS Showcase at LocWorld here: WeMT Tools and Processes

In addition, you can learn more about the Welocalize and Intuit MT story as presented by Tuyen Ho, Senior Director at Welocalize and Intuit’s Render Chiu at LocWorld 2013 in Silicon Valley by viewing: “Silver Linings Playbook – Intuit’s MT Journey. You can also read Tuyen’s blog about the presentation with Intuit.

MT and the French Riviera

By Laura Casanellas

This month I found myself in Nice, at the heart of the French Riviera, discussing Machine Translation (MT) and surrounded by experts in the field of MT and localization. As a member of Welocalize’s global Language Tools Team, this was a great opportunity to learn about the latest advances in MT.

I attended the Machine Translation Summit XIV, the international conference which takes place every two years, bringing together important names in the area of MT from the European, Asian and American sister associations.

In the world of localization and translation, MT is a growing phenomenon. Successful MT deployments are on the up. Perception around MT is rapidly changing; whereas a couple of years ago people would have focused on quality of MT output (or the lack of it as they saw it), nowadays what users want to know is how to make it work. This approach is helping the time, cost and quality equation which is always at the centre of every localization business. As more digital content is created and the possibilities of making it accessible to different locales become tangible, many companies are beginning to understand that not every type of text needs to be translated to the same level of quality. Once this level has been established, time to market and cost can be adjusted accordingly. The role of MT is becoming more significant as a consequence of the dramatic increase in volume of content.

The subject of MT is spreading into the commercial world. Proof of this was the large presence of commercial users among the conference attendees.

I presented at the MT Summit together with my colleague Lena Marg. Lena is in charge of training our language force on Post-Editing practices. In our presentation, “Connectivity, Adaptability, Productivity, Quality, Price….Getting the MT Recipe Right” we explained Welocalize’s practices around MT and what we consider are all the necessary elements to create a successful Machine Translation program.

Olga Beregovaya and Dave Clarke from Welocalize Language Tools Team also took part in the Summit; delivering a joint presentation with Dr. Alon Lavie, CEO of Safaba Translation Solutions, CMU research professor, a highly regarded figure in the world of MT. The presentation was entitled “Analyzing and Predicting MT Utility and Post-Editing Productivity in Enterprise-scale Translation Projects”. This presentation set out the joint Welocalize – Safaba research which has begun to identify the effect of features in ‘real-world’ content on post-editing efficiency and predictability, such as: presence, retention and placement of tags; recognized terminology; do-not-translate lists and more.

For the first time ever, this MT Summit showed a slightly higher attendance figure of industry and user representatives over the academic; with a high proportion of participants attending the user track presentations. This is a clear sign that the commercial world is following academics and researchers and that MT is becoming main stream.

The strategy of Welocalize is to keep abreast of all technological advances that are currently taking place around language tools in general and in particular, MT. Using and deploying the new technologies around MT that we identify as having true potential commercially.

You can view the presentations by visiting:

http://www.slideshare.net/Welocalize/mt-summit-2013-welocalize-presentation-lcasanellas-and-l-marg

http://www.slideshare.net/Welocalize/safaba-welocalize-mt-summit-2013-usertrackpresentation

Another MT case study will be showcased at the forthcoming Localization World Silicon Valley, 9-11 October. “Silver Linings Playbook – Intuit’s MT Journey” will be presented by Render Chiu from American Software company, Intuit and Tuyen Ho, Welocalize’s Senior Sales Director for North America. Render and Tuyen will be talking about how Intuit and Welocalize architected an MT program that supports the enterprise, and successfully met an aggressive product launch schedule.

why did welocalize join the open TM2

For information on why we joined the Open TM2 initiative, please follow this link: http://www.lisa.org/globalizationinsider/2010/07/open_tm2.html

I have also included the interview below.

Smith

What is it about the Open TM2 initiative that motivated Welocalize to get involved?

Most translators use some type of translation workbench tool. Most clients use some type of content management tool, and most vendors use some type of translation management tool. To make it even more interesting, add machine translation tools, authoring tools and a variety of content types. Now combine all of those users and their various tools and try to pass the content type you want translated between each of them, and tell everyone they have half the budget, time and staff to do it!

Yes, I have exaggerated a bit to make a point, but the basic elements of this challenge are what I am hearing from clients, vendors and translators. Traditional methods across our translation supply chain are just not up to the task of the now always-on velocity of end user demands.

In order to increase velocity across the translation supply chain, we need to increase automation which implies more integration, interoperability, extensibility – and standards. We are by no means the first industry to confront this challenge, so why not borrow what has worked elsewhere. At the heart of every sophisticated and mature supply chain is a consistently followed set of standards. As Craig Barrett, former Chairman of Intel, stated, “The world is getting smaller on a daily basis. Hardware, software and content move independent of, and irrespective of, international boundaries. As that increasingly happens, the need to have commonality and interoperability grows. You need standards so that the movie made in China or India plays in the equipment delivered in the United States, or the Web site supporting Intel in the United States plays on the computer in China.”

What sort of progress do you think has been made in the area of standards, and what work remains?

Unicode has probably been the most successful standard related to our industry. Unicode specifies a standard for the representation of text in just about any language across software products and systems. Before Unicode, there were hundreds of different encoding systems, and they often conflicted with each other. The significant problem was potential corruption in the passing of text representation data between different encodings or platforms. Thus the Unicode Consortium was formed, and to its credit, Unicode now “enables a single software product or a single website to be targeted across multiple platforms, languages and countries without re-engineering. It allows data to be transported through many different systems without corruption.”

Other standards, such as TMX, have not been as successful. We need to understand why this has been the case? As Bill Sullivan, IBM Globalization Executive, stated, “There is a recognized and growing need for standards in the localization industry. Despite our best intentions, however, standards themselves can often be vague and open to multiple interpretations. What is needed are reference implementations and reference platforms that serve as concrete and unambiguous models in support of the standard.”

This is the work that remains. We need to demonstrate more tangible benefits for adhering to a standard in typical use case scenarios and integrations. How can a client easily integrate the translation assets of an acquisition? How can a client plug-and-play what they deem as the best tool components? How can a client change tools? These are the simple questions I hear. To get closer to the answers, the Open TM2 Steering Committee is working on a Joomla (content management), Open TM2 (translator’s workbench) and GlobalSight (translation management system) integration. The goal is to develop a viable data exchange standard which works seamlessly in this 3-way environment and then extend it to other integrations in the translation supply chain.

LISA will document and publicize the resultant standards. However, neither the Open TM2 initiative nor LISA alone can make the greater vision a reality. As the Unicode initiative demonstrated, broad participation and support across the industry is necessary to achieve success. The Unicode Consortium includes corporate, institutional, individual, NGO and public sector members all collaborating with a unified purpose.

What is different about the Open TM2 initiative?

Open source and “free” are often found in the same sentence. Yes, there is no charge to download an open source product such as Open TM2 or GlobalSight, but there is a cost associated with support, training and customization to specific needs. Open source is not a “free lunch”, but it is an opportunity to engage, integrate and customize at a much deeper level and at a faster pace. The result is potentially a product that is more suited to one’s needs, more easily integrated with other products and a lower total cost of ownership. But what you get out of it is subject to what you put into it. As an ancient Chinese proverb reads, “Talk does not cook rice.” We need people willing to take action. These concepts apply to all open source projects.

What I think is different, and exciting, in this Open TM2 initiative is an increasing alignment of broader interests. Industries typically do not change significantly until the market forces them to change (look at the American auto industry). I think there are some market mega trends in play right now (cloud computing, mobile computing, social computing, open source) and those who don’t adapt to these trends will quickly be left behind. The “translation project” as we knew it traditionally is rapidly morphing into on-demand translation. SimShip is rapidly morphing into SimStream (simultaneous streaming releases). Translation tools and platforms are rapidly morphing into “mash-ups” (combinations of different tools with the sum benefits being significantly greater than the individual benefits). The translation service on the whole is rapidly morphing into a utility inside a broader and more deeply integrated global content supply chain. RFPs now have pages and pages of interoperability, integration and optimization questions. And according to Gartner, “The number of open-source projects doubles every 14 months. By 2012, 90% of companies with IT services will use open-source products of some type.”

So, I think the timing is right. Many, certainly not all, clients, LSPs, tool providers and translators alike are realizing that it is in the best interest of the supply chain as a whole to collaborate to achieve something on the scale of what was achieved with the Unicode standard. “Do not go where the path may lead, go instead where there is no path and leave a trail.” – Ralph Waldo Emerson