How Can AI Help the Localization Workflow and Quality Process?
USE CASE: NLP-DRIVEN CONTENT ANALYSIS + LQA
How do you process more content in more languages for the same cost (or less) while managing risk and not compromising on quality? You use artificial intelligence (AI).
Global brands are managing huge volumes of content every day in multiple locales. Content levels are going up, but many localization budgets are not. Using AI-enabled tools in the globalization lifecycle, such as pre-production source content analysis and language quality assurance (LQA), can help manage quality and cost.
|Jon Ritzdorf, Senior Manager, Global Content Solutions at Procore Technologies, and Alex Yanishevsky, Director of AI Deployments at Welocalize, addressed these challenges at a recent GlobalSaké technology event.
They considered the potential for using AI (specifically, NLP tools) to analyze source knowledge base materials and technical content for machine translation (MT) suitability and to create efficiency at the LQA stage.
Procore Technologies is a leading construction management software as a service (Saas) company. It works in a highly specialized area, publishing large volumes of content, including Procore Knowledge Base, the primary source of customer support for the global platform. The company works with Welocalize as its main localization partner.
A key localization goal for Procore is to quickly and accurately publish knowledge base content in different languages. Using MT with post-editing (MTPE) is a popular approach. However, to meet the demand for growing volumes in expanding language markets, publishing straight from the MT output is quicker and more scalable.
But how do you guarantee quality?
Using AI in the localization workflow can help manage the risk so only problematic content is flagged automatically. This helps localization teams decide whether content is even suitable for MT and makes LQA much more targeted.
“We need to publish quality multilingual content fast, but a lot of our content is very specialized and highly technical. When you have 3,500 articles and 25 million words across 10+ languages, it’s hard to pinpoint if something needs to be fixed in the MT output. We still need a post-editing stage. AI gives us that ability to publish high quality multilingual content directly. Welocalize’s NLP solution can capture and process so much data—in ways humans can’t—to create efficiencies and savings in the localization workflow,” comments Ritzdorf.
The NLP tools analyze the source and target content to see where potential problems may arise.
In source content analysis, NLP can be used to rate whether the text is actually suitable for MT. The text may be so complex that raw MT will may not produce high enough quality. If this is flagged early, the risk can be remediated by post-editing to an agreed-upon quality level.
For translated content, instead of huge amounts of MT output going through post-editing, only the files that have been flagged as complex need to go through a human review process. The tool can also identify if the translation process has introduced unnecessary complexity, relative to the source.
”In essence, the NLP tool is telling us, ‘this is the targeted content we need to check’, pinpointing potential quality issues in the source and translated materials,” states Yanishevsky.
Without AI, for content types such as knowledge bases and marketing content, it would take too many people to perform proper subject matter expert LQA reviews that match the speed of AI.
AI-POWERED SOURCE CONTENT PROFILING IN TECHNICAL CONTENT
Let’s look at some sample text, typical in manufacturing and technical documentation:->
This one sentence has 42 words, 22 nouns, 19 long words, and 9 complex words! Using AI-driven source content profiling, this sentence would be flagged as problematic and flagged for a more in-depth review prior to translation. If a sentence like this went straight to MT, the translation would most likely produce inaccurate results, sparking a lengthy (and costly) review process.
NLP tools can also differentiate between content types, domains, and industries. Historical data is mined for each domain or product type. Then, complexity and readability thresholds are customized based on this historical data.
Yanishevsky comments, “You have to differentiate between industries and content types when looking at historical, training data for AI. If thresholds were set to flag any sentence with more than 25 words, then virtually all legal content would fail. In contrast, virtually all user interface (UI) content would pass.”
Ritzdorf concurs with this approach. “Highly technical content can be challenging. It often contains long sentences and lots of nouns. Using AI-driven source and target content profiling solutions could help us identify problems for tech-heavy content, such as Procore Knowledge Base. It would provide the confidence to be able to publish straight from the MT output, skipping the post-editing stage and create efficiencies by pinpointing errors in the source and target text,” he adds.
SOME KEY RESULTS
- NLP-driven automation to process content and raise potential red flags
- AI-enabled scalability to manage growing content volumes
- Identify content not suitable for raw MT
- Estimated 20% time savings
- Estimated 15% reduction in LQA costs
Alex Yanishevsky and Jon Ritzdorf presented this use case at Globaksake Q3 2022 Virtual Event. Watch segments of their presentation here.