The Art of Collecting Multilingual Datasets with Aaron Schliem

welocalize July 18, 2022

Artificial intelligence (AI) has opened up new developments for customer support, offering global brands opportunities to provide optimum customer experience. 

Multilingual conversational AI is an important, but complex part of that. It requires a high level of expertise to build these systems and meet the necessary business function, cultural, and linguistic requirements.  

Aaron Schliem is a Solutions Architect at Welocalize, specializing in AI solutions. Put simply, Aaron works with clients to figure out how they can optimize the process of collecting data that they will use to train and test different AI services such as chatbots or in-product support. 

Aaron recently chatted to our AI Services Director, Tiarne Hawkins as part of her LinkedIn event series, You & AI about his role in AI at Welocalize. 


Watch this You & AI session with Aaron now. 


How did you end up working in AI? What’s your background? 

I grew up in a small town in Wisconsin, and all I wanted was to desperately get out and to go see the world. Part of my curiosity about the world has been about language, but also, about culture. I’ve always been fascinated to see how people behave in different cultures. When I travel, I rarely stay at the conference hotel. I stay somewhere else, and I force myself to take public transport. For example, I want to know what’s it like to ride the subway in Tokyo? Because that’s a part of life, it’s a part of culture.  

I moved to Chile when I was still in college and lived there for five years. I started a business there, teaching English, translating, and doing scientific editing for academics. When I came back from Chile, that’s what I knew how to do – language stuff! 

I’ve basically had every job you can think of in the language industry, from language teaching, and medical interpreting and voice acting.  

In terms of AI, I got into AI by chance. I founded a company 20 years ago in Seattle which has some big tech clients buying data to build algorithms around search relevance, product relevance, spell checks, text input systems, and more. So those were the opportunities that showed up and because I love language and culture, more than I like business. I fell into that!  

Luckily, I started working here at Welocalize, which is a great company and there were emerging opportunities. I jumped right in and it’s a great fit.  

What does a typical day in your life look like?  

So, on the one hand, there’s time that gets spent with clients, who are often big brands with customers all over the world. A big part of what we do in the solutions team is to talk to clients about not just the specifications and the data, but to also find out about what are they trying to build, what are their objectives? Who is their market? We want to figure out what they need and why do they need it.  

We also spend quite a lot of time working with our product team and our natural language processing (NLP) engineering teams. So, we’re always trying to make things efficient for Welocalize, so that means, how are we going to actually dispatch work to our workforce? How are we going to pay people for these things that they’re doing? How are we going to integrate in NLP tech to make the work better, faster?  

Can you give an example of a typical conversation with a client around building multilingual datasets?  

A good example is a client wants us to build a dataset around conversational AI. And they want to build it in Spanish. Well, that’s a big world, the world of Spanish could be a whole lot of things! We need information about who are the users? Who are these people? Do they only speak Spanish? Do they live in Chile? Los Angeles? Did they grow up speaking English and Spanish? 

There’s a lot of questions that need to go into the nature of the product (for example, a chatbot), who it’s going to serve. And then a bunch of questions about what does the data need to look like? In order for your data science team to use it, what do we need to deliver to you? It’s a really good way to generate some interesting conversation that I think is often fruitful for our client organizations.  

What do you think makes a successful client program?  

I think a successful program from a client side is one that understands why it’s doing the things that it’s doing. I think there’s a real risk right now where organizations are excited about AI because it’s a hot new topic.  

They’ll say “let’s do some AI, let’s make some AI happen” which is not a good reason to do AI. 

AI is a tool to produce a certain result or a certain kind of experience. Clients need to know what kind of experience they’re targeting or what kind of gains they’re looking for. So, if we have a client who’s building a text based chatbot, how are they going to measure success? Is it that they want to lower the number of human customer support tickets? Is it that they want to improve the speed of resolution in their customer support? What’s going to be the measure of success? 

Successful programs understand, for multilingual AI, the difference between translation and building multilingual AI. 

How do multilingual datasets and general translations differ? 

There is a really key distinction that often is lost. Usually when organizations think multilingual and global, they automatically think it falls under localization. But localizers generally don’t know very much at all about AI, they don’t know very much about data as a deliverable. They don’t understand how language manifests in a data set versus published content on your website or a UI product.  


For further insights, check out Aaron’s session “Global CX and Support Through Two Lenses: Localization and Conversational AI,” from our 2021 Let’s Go On Demand Summit. His session discussed the capabilities of conversational AI in delivering optimal customer experience. 

Find out how Welocalize can help you take your AI and content global. Connect with us here