NLP in Digital Content. Who’s BERT? Webinar Highlights

Welocalize May 20, 2020

For global brands, being found online by the right customers, at the right time, in the right language is an all-important part of the overall digital customer journey. When most people are thinking about buying something, wherever they are in the world, they reach for their device and type a simple phrase into a search engine – often Google.

Search is a huge part of any digital inbound marketing campaign and often the primary source of website traffic. Understanding language, local nuances, and how to get top search results is an ongoing challenge for many marketing and localization teams.

Welocalize webinar, ‘Natural Language Processing (NLP) in Digital Content. Who’s BERT? looks at NLP and BERT, both language models that impact search and overall content performance in multiple language markets. Click here to view webinar recording.

‘BERT is the biggest leap forward in the past five years, and one of the biggest leaps forward in the history of search.’ Pandu Nayak, Google Fellow & Vice President, Search

The aim of this webinar is to help marketing and localization teams improve the visibility and performance of their international content strategies, using NLP and BERT. As Google owns a large chunk of the overall search landscape, it makes sense to leverage their upcoming technologies that will drive higher rankings with the assumption that where Google goes, other search engines follow.

What is BERT? BERT is an NLP and contextual language model that greatly improves the way computers can understand language and its nuances.

Natasha Latysheva, Machine Learning Engineer in Welocalize’s AI and NLP deployments team opens the webinar by explaining exactly what NLP is, then looks at the role of NLP and language models in global content – using predictive text in smartphones as the most obvious example to help understand how language models work in everyday life. She explains how NLP applies machine learning to solve language problems and that by leveraging BERT as a language model, Google better understands the sense or meaning of search queries.

Gurdeep Gola, Director at Welocalize Digital Marketing, focuses on how you can create content with NLP and BERT in mind and how it impacts content performance to increase customer acquisition. He looked at the four steps to success: Define Intent, Be Direct and Clear, Improve Salience, and Test It! Gurdeep outlines opportunities that brands can take to improve their international content strategies.

In the live webinar, we received some thought-provoking questions. Here are some of the questions, answered by Natasha and Gurdeep:

How does the BERT language model handle words that are the same spelling, but different meanings – such as “well” meaning “good”, “thorough”, “a well for water”, “in good health”, etc.?

The “meaning” of the word “well” would depend on its context and usage within the sentence. BERT generates internal representations of each word (basically a list of numbers), and this representation takes into account all other words in the sequence. This means that the definition of each word is constantly changing and being updated depending on its usage, allowing for a single word like “well” to play many different roles.

Does Google Translate use BERT?

Not exactly. Like BERT, Google Translate is based on deep neural networks, but the NLP tasks that the two models solve are different:

  • BERT is fundamentally a language model which seeks to model language and predict missing or masked words
  • Google Translate is a translation model that maps sequences of words from one language into the corresponding sequence of words in a different language.

There is another connection between BERT and Google Translate – both use a type of neural network called a transformer, which was originally developed by Google and has become the state-of-the-art approach to solving many problems in NLP.

A lot of the examples given [in the webinar] are consumer, lifestyle-focused. How does this apply in the B2B market where people might not ask questions?

All principles of making content accessible for NLP is applicable for both B2C and also B2B. User journeys will be very different but understanding the intent and creating concise content to meet that demand which will work for both users and language models.

How do I prioritize which content to ‘optimize’ or create with BERT in mind?

Generally speaking, BERT will have a much better understanding of the relevance of your content without you having to optimize. However, you should think about ensuring you can test for further improvements by looking at some of the key questions asked by your users. Have you got FAQ pages to answer these queries? Are other sites ranking for Q&A terms around your products and services? If you can answer these queries and expand these answers to cover variations of intent, it will absolutely benefit the ability for that content to surface to the right users.

This all feels like a huge technical advancement, what do you think will come next?

NLP has seen an enormous amount of progress in the last few years, and it’s fair to say that it is one of the most fast-paced and exciting areas of machine learning at the moment. Even in the last year, there has been some landmark work done in one of the hardest problems in NLP – automatically generating convincing language. The classic example is the GPT and GPT-2 language models from OpenAI, which write incredibly natural and human-like text. Big language models like GPT-2 and BERT, and others are a huge step forward in allowing machines to better understand language, which will undoubtedly have effects in downstream tasks – from better understanding of users’ Google queries, to even more challenging tasks like improving the capability of chat bots to naturally converse with humans.

Further automation leveraging BERT is inevitable for the future, whether improving automated ad formats such as Dynamic Search Ads (for paid search). This technology will ultimately mean that contextual placement and automation in advertising will improve, with the aim that advertisers will be able to self-serve to a greater extent in the future.

What are your most cutting edge clients doing with this technology?

It varies widely according to the sector and specific needs of the client. We see people interested in solving a variety of language-related tasks – machine translation, text classification (for example, sentiment analysis or classification of text into topics), text clustering (for example, grouping similar pieces of text, smarter de-duplication), and text generation (for example, automatically generating variations of sentences) are common classes of problems.

We’re seeing the gap bridged between language and performance –  language can be a performance driver and a KPI and that can actively impact customer acquisition.

Do you think NLP and machine translation could in the future almost or entirely replace human linguists? As opposed to “controlled language”?

Machine translation and other NLP approaches are likely to become increasingly popular and widely-used, and the quality of these methods will continue to improve. Content that is particularly high-stakes or creative – like marketing content, landing page text, or literature – will likely always benefit from being written, edited, or at least sanity-checked by humans. However, it seems probable that more routine linguistic tasks will become increasingly automated as the methodology continues to increase in quality.


If you would like to continue the discussion on NLP and BERT or take part in one of our discovery sessions, connect with us here.

Further Reading: NLP and BERT – A New Way with Words