Unlocking LLM Performance – A to Z: Podcast Episode 9 with Aaron Schliem

This ninth episode of the Welocalize podcast features Aaron Schliem, Senior AI Solutions Architect at Welocalize. In conversation with Aaron, regular podcast host Louise Law explores the world of Language Model Models (LLMs) and discusses how humans can take specific steps to ensure that LLMs deliver the results we desire to meet our goals. Aaron…

February 8, 2024

Podcast

This ninth episode of the Welocalize podcast features Aaron Schliem, Senior AI Solutions Architect at Welocalize.

In conversation with Aaron, regular podcast host Louise Law explores the world of Language Model Models (LLMs) and discusses how humans can take specific steps to ensure that LLMs deliver the results we desire to meet our goals.

Aaron simplifies complex concepts by outlining practical strategies for training LLMs to align with global business goals while avoiding undesirable hallucinations. He walks listeners through key topics on how to effectively shape LLM behavior so users get what they want and need – including prompt engineering, model fine-tuning, reward models, and RAG (retrieval augmented generation).

This podcast offers valuable insights into the art of customizing these powerful language models for multilingual business use. 

Podcast highlights:

What are LLMs? A refresher.
How can you improve LLM performance and output?
Prompt engineering strategies & RAG
Fine-tuning LLMs, including Reinforcement Learning with Human Feedback (RLHF)
Ensuring LLMs do no harm

You might also like…

Podcast Episode 8 with Mikaela Grace and Brennan Smith: LLMs and Their Feelings

Podcast Episode 7 with Chris Grebisz: When Can We Trust Generative AI?

Blog Post: The Power of RAG in GenAI and Global Content

Case Study: Accelerating LLM Development and Fine-Tuning

TRANSCRIPT

Louise Law

So welcome to the Welocalize podcast where we discuss the latest and popular topics relating to global content, translation and the application of AI and machine learning that drive multilingual communication. Today I’m joined by Aaron Schliem, one of Welocalize’s AI expert. And somebody who I really enjoy talking to because he has the rare ability to explain complex concepts and simplify them using really good real-life applications and stories. So, Aaron, welcome.

Aaron Schliem

Thanks, Louise. Happy to be here.

Louise Law

Today we’re talking about unlocking large language model performance, so LLMs unlocking LLM performance, and this continues the discussions of LLMs and content creation and how we train LLMs to give us what we want. So Aaron, for the sake of the listeners, could you explain a little bit about why I’ve called you an AI expert and some of the work that you do with global brands and Welocalize?

Aaron Schliem

Yeah, yeah, for sure. So I come from really about 15 years’ worth of experience building AI data sets. We come originally from the localization industry, but when people who are building models are seeking data, especially multilingual data. Localization providers have been a really traditional way of getting that data, so I come from a long history from, you know, dating myself a little bit of developing data sets nowadays with Welocalize, my focus is primarily on solutions strategy and how we actually work with client organizations, clients, data science teams, data engineering teams to deliver data sets that are going to be fit for purpose. Again, whether that is for improving the performance of LLMs or building LLMs for really any kinds of different ML data sets. In addition to the data sets you know I spend quite a bit of time talking with our localization clients on the ways that this technology can be deployed to support efficiency and improve quality in the world of content.

Louise Law

It’s helping business leaders understand and realize the potential of AI, whilst not necessarily knowing what’s going on in the bonnet, you know that’s so important for people to use AI, but it’s about what can AI do for me and my organization, isn’t it?

Aaron Schliem

Exactly. Yep, it’s understanding what is this technology, where can it be applied and how can we best tune it in to our specific use cases to improve the way the businesses are performing.

Louise Law

As a refresher, can you remind listeners what an LLM is and their relevance in global business?

Aaron Schliem

So, LLMs are language models that are built using really massive data sets. So, if you think of almost the entirety of the Internet, all of that data is used to pre train a model in an unsupervised fashion. So, it’s pattern matching and the machine trains itself to figure out how to make connections. And when we think about the connections, we often use the word deep learning for this and deep learning really the easiest way to think about that is thinking about like a neurological infrastructure, the way our brains work, right?

Louise Law

Right.

Aaron Schliem

So, we have neurons and when one neuron fires, when it finishes firing, it triggers the action of the next neuron in the series. That’s kind of how these LLMs work, so one of them fires, there’s an input. We ask a question, it fires, and then successive layers of neurons get fired in order to predict what’s the next word in the sequence, right? So sometimes we think of simple LLMs as essentially like a completion machine, something that is just figure out what’s the next step in the way in determining how to respond to a question. But one of the challenges with LLMs is that they are frozen in time. Now, what does that mean? It means that the LLM only is aware of the data that it was provided at the moment that it was pre trained, so that’s a particular challenge that we have related to LLMs.

Louise Law

It already has data provided, so given the fact that there’s probably limitations of the pre trained LLMs, how can you or how can we improve their performance and make them fit for purpose?

Aaron Schliem

Yeah. So, there’s a couple of ways that we think about that. One of them is, well, the two things are thinking about enhancing the knowledge of the LLM and the other thing is thinking about improving the behavior of the LLM.

Louise Law

OK.

Aaron Schliem

When we think about knowledge, like I said before, they’re sort of fixed in time, right? They don’t know what’s happening right now. They don’t know who won the Super Bowl to give a very American example. They don’t know who won the American Football Super Bowl a couple of weeks ago because the model was trained on data from 2023, right? One way is to enhance the knowledge whether that is about current events, particular domain of expertise, a language that’s pretty relevant to our industry really in terms of specific implementations, a brand, a company’s identity. So that’s knowledge. The other piece here is thinking about behavior so the LLMs, you know, they fire their neurons. They predict the next word in the sequence, but they don’t necessarily communicate in a style that we love. So, another way that we enhance LLMs is by teaching them – how do we expect the LLM to respond to us in terms of style, in terms of the structure of responses, do we want just a quick response or do we want a friendly full response with an example. And then the last piece around behavior is thinking about safety. So, when we say safety in the world of LLMs where we’re talking about is avoiding behaviors that humans deem to be negative or dangerous. So, for example, people may try to get an LLM to, you know, give instructions on how to make a chemical weapon. Well, we don’t want that. We don’t want LLMs behaving that way. So that’s another way that we can modify the behavior of LLMs. We can teach it to not provide that kind of information.

Louise Law

How does all this play out in reality? When working with the model you work with clients with LLMs and training the models on a day-to-day basis. What’s the reality of all of this when you’re working with the models themselves?

Aaron Schliem

It depends a lot on access. You know, we know that LLMs, because they are so massive, they require a really incredible amount of computing power to build. Only the largest companies in the world really have the resources, or at least are willing to allocate the resources towards the computing power that you need to build one of these models and as a result there’s a small number of companies or organizations that have direct access to the models and then a lot of organizations and users just don’t have access. So, let’s think about it in those two terms, access to model or not access to model and maybe we could start with the world where you don’t have access to the model.

Louise Law

So, would that be like, a smaller company, Aaron?

Aaron Schliem

Exactly. Yeah. That’s anything from a small company to individual users out there. And when you don’t have access to the model, there’s a variety of different techniques that you can deploy to improve the results. And maybe we can start from the simplest of those and move to the more complex. So, if we think about simple, well, let’s imagine really just the most basic thing, and we would call it zero shot prompting, right? So, zero shot a shot is an example. In this case, we’re going to say we’re not going to give any examples, we’re just going to give a prompt. We’re going to tell the LLM what we want. Typically, with an LLM, there’s two different parts to a prompt. One of them is the system prompt and the other is the user prompt. In the system prompt, we’re explaining to the LLM what role we want it to take. What are the basic rules of engagement that we want it to consider when we actually ask it our question. So that’s the system problem. The other piece is the user prompt and that is our specific request. What exactly do we want the LLM to do within this world of, you know, just sending in a prompt? We want to think about a couple of different chunks here. One piece is really the clarity of what we’re requesting when we give a prompt, the more detail that we include in our prompt the more likely we’re going to be to get a relevant answer. Good example is and I’m sure a lot of us have done this already, right? We say, hey, summarize these meeting notes for me. I hear lots of people asking that question. OK, great, you might get a decent summary, but you will probably get a much better summary if you say something like summarize these meaning notes in a single paragraph and give me bullet points for all of the key points and listing the speakers for each point. So the more specific we can be, the more likely the LLM is going to be to give us an answer that we like or that is useful to us.

Louise Law

It’s like a human.

Aaron Schliem

Exactly. Yeah, it’s not a bad way to think about it, right. I think about the way that I communicate with my children, as they’re learning. If you tell your children, go clean your room, they’ll clean it, but if you say clean your room and make sure that the clothes are clean, that makes it more likely that will happen. Another thing that you can do is ask the model to adopt a persona. I might say hey, in the system prompt I could say I want you to reply to any question that I ask and include a playful funny comment in there. OK, so we’re telling the model that we expect it to have a particular style and we want to see that in the response. So that then when we ask the question, not only will it answer the question, but it will imbue that answer with that particular funny playful style, right? Another way that you could do this is you could just say hey, edit this or summarize this paragraph so that it would be understandable for a tenure. It’s a pretty common kind of request and it’s not just saying summarize this, you’re indirectly telling it to summarize it in a simple fashion but by telling it to adopt a particular frame of mind, or a particular persona, in this case a 10-year-old. And then the last piece here in terms of the specificity of the content would be thinking about the size of the output. So, it’s a very useful technique to just say specifically how many words how many sentences do you want the LLM to produce, how many paragraphs do you want it to produce? When you limit the size of the output, you’re sort of constraining the model in a way that it’s going to be more likely to give you the result that you want. One piece of this, as I said, is the specificity of the information and another one is context. The way that we can think about improving context for the LLM one of the ways is by offering a reference text. Rather than just saying, hey, what’s the answer to this question, you might provide the LLM with a document, right? Let’s say it’s a two or three pages worth of content and you can say OK, look in this document and tell me the answer to my question by referencing what’s here, what I’m providing you. Again, this is improving the contextualization of your question and it’s going to improve the likelihood of you getting a response that you want.

Another way that you can improve context is by sort of starting the answer for the LLM. Again, I really like your way of thinking about this, like teaching a child right or other human beings like you kind of lead people towards the answer you want. So, you know an example I like is in our industry we do a lot of translation. So, you might say, hey, translate this sentence. I would like to see a movie next Tuesday, right? And I want to translate that into Spanish. So, I could start in my prompt by saying in Spanish, gustaria ver … right and again you’re sort of priming the answer by starting it, you know, starting it off with what you wanted to say. And then one more thing that we can think about around the prompting techniques where there’s nothing else that’s in play, is really just some really specific phrases that researchers have found to be very effective. And this may seem kind of like a no brainer, but it is important to keep in mind that you can use phrases like you must or your task is or you will be penalized. So, these are things that exist in human language that the LLMs do, I’ll say understand, even though I don’t want to sort of anthropomorphize them. But there are these ways of speaking or of prompting, that do produce better results. A couple of my favorites, and I think actually these might have been mentioned in a previous podcast with Mikaela Grace, our fantastic Head of AI/ML Engineering is the idea that you could say I’m going to tip, as in give money, $50,000 for a better solution, right? These sort of things that would motivate a human being also motivate the LLM to get the right answer. Or my favorite is the idea of take a deep breath. We often, you know, find this in our own lives if we take a deep breath, we can come up with a more concrete, more focused answer that works with LLMs miraculously.

Louise Law

It’s great that you referenced the podcast with Mikaela and Brennan with do LLMs have feelings because they don’t, but there is some element you know, combined with everything else that you’ve just said about context. You know, phrases in prompts and saying this, this answer really means a lot to me. Or here’s some money if you give me the right answer. You know that these are all great techniques are in there. It’s kind of getting what we want out of our LLMs, right?

Aaron Schliem

Yeah, that’s exactly right. I mean, and there’s a few more ways that we can think about this. So, let’s enhance the complexity here, right. So we’ve begun with just the basics of what you say.

Louise Law

Ok, let‘s get complex.

Aaron Schliem

Next, yeah, the next layer would be now giving examples. Or remember the word I used before was shots, right? A single shot example or few shot few examples and in this case, a really easy one to understand is translation. You could say please translate this sentence into French, but then you could provide 3 examples of English and French, translated the way that you want the translation to flow. This is a great way to get again the LLM to perform in a way that you want by providing these examples. It’s another type of context, right. Sometimes we provide context by explaining it and sometimes we provide the context by giving examples that it can follow. Another way of doing this is or another way of providing more, a more complex kind of prompt is by something that we call chain of thought. So essentially what we do is we tell the model to break down a process and think step by step. Literally we say let’s think step by step. And by asking the LLM to slow down and think in multiple steps it can, it has been shown in different academic articles to produce more accurate results, better results because it is thinking in a way that is aligned with really how humans think about things by breaking them down into smaller pieces.

Louise Law

OK.

Aaron Schliem

And then one more way that we can really now again sort of enhance the ability of the LLM to produce a response that we want is through retrieval, augmented generation. You’ll often hear this called rag, RAG and this is really quite common nowadays. The idea here is to say, hey, I realized that the LLM has limits in terms of what knowledge it can reference. Remember, before we were talking about knowledge and behavior. So, let’s say that I am a company and I know that the LLM has not consumed my company’s brand guidelines.

They weren’t part of the Internet when it was consumed. So, what I can do is we can convert documents into essentially a mathematical form into vector indexes, and by converting our documents into mathematical formulations, we can now feed those in and allow them to be references that can then be concatenated with the rest of our query, the rest of our prompt so that the LLM can benefit not only from how it processes information, but also from this additional augmented set of information that we’re providing. This is a really powerful way of enhancing the reliability or as often we talk about decreasing the level of hallucination or incorrect you know, very confident incorrect answers that we get from the LLMs.

Louise Law

You mentioned hallucinations. That’s the first time I know that hallucinations and RAG that they’re really big topics that arebeing discussed at the moment in terms of alarms and data and everything like that. So, I’m really glad you touched on those Aaron.

Aaron Schliem

They really go hand in hand, and we see two different ways that we can use RAG to enhance our the results or decrease hallucination. One of them is really at the model owner scale. So, if we think about a company like Google for example, when you’re using Bard or their LLM tools they are using RAG technology in the background. To actually reference reality right, Google has an amazing source of truth through its search engines, and so that can happen on the back end, and we may not even realize that the model is using RAG techniques in order to enhance knowledge. But there are also opportunities for smaller organizations to use API tools. So, for example Open AI has an ability to use APIs in order to embed reference documents when you’re using ChatGPT. So, those are a couple of different ways that we can do it.

Louise Law

Everything you just talked about is if we don’t have access to the model, and these are really great prompt engineering strategies that can help people get what we want out of our LLMs you know get the best possible outcome that we want but if you do have access to the model, it’s a different conversation, isn’t it, Aaron?

Aaron Schliem

That’s right. Yep. And this is where now we’re actually now modifying the models themselves. And this is another part of the Welocalize business of helping companies to develop data sets that do this. There’s a couple of really common ways that we try to really modify the way that the model itself performs. One of them is through supervised fine tuning. In supervised fine tuning, the simplest way to think about it is that we provide sets of prompts and responses that model the way that we want the LLM to perform and within those examples that we provide, we’ll be setting things like the right style, the right tone. We will help it to understand how to handle complex prompts by giving examples of how we human beings would handle those. We provide these sets of data and then we apply them to the models to basically add additional layers. So, remember before we talked about deep learning and these layers of neurons? So, what we’re doing with this supervised fine tuning data is we’re either modifying the way that those neurons are firing. So, we’re adding layers in between the neurons or at the end of the chain of neurons, but we’re adding some additional layers of processing that help the model to perform in the way that. The other thing that we do is what is called reinforced learning with human feedback. So many of you, if you’ve been digging around and sniffing out the various different resources out there that explain how LLMs work, we’ll have seen the acronym RLHF. And that’s what this is. What it means essentially, is that we will prompt the LLM with a question with a you know with a request and then we will have human beings judge the feedback that we get. So maybe we will get, let’s say three different answers from the LLM. Then we’ll have a human being tell us, hey, I think #1 is the best answer. #1 is really what I was looking for and so by getting these human preferences, this set of data around human preferences on the responses, we build what’s called a reward model. So again, you think about almost Pavlov’s dog, right? You’re itching. You know, again, I run the risk of maybe not anthropomorphizing, but whatever the word for turning it into a dog is. But basically, providing rewards reinforcement to say yes, that’s the way we want you to perform. So that’s RLHC. If it’s human beings providing feedback on how and what kinds of answers are preferred. You know the way that this is actually implemented again depends on the size of the organization. If you own the LM, then you’re going to actually often be able to create a derivative model of the new model that is really fine-tuned for a particular purpose or a particular domain, right? Sometimes we’ll train or fine tune, an LLM for Chat bots for conversation and the other option is again if you don’t own the model there are APIs that we can use now. This is sort of rapidly expanding, but we do have the ability now to do some fine tuning if we’re a smaller organization that doesn’t necessarily own the model, but we can get access to it. We use APIs to deliver this kind of fine tuning.

Louise Law

It sounds really interesting, and we’ve gone through it very quickly in terms of the A-Z on how to get the most out of your LLM, but there’s some really good concepts that I certainly think are very important conversations at the moment and that there’s so much material out there that listeners could also do some reading up on too. We’ve talked about prompt engineering only in the model not having access to the model and how to get the right output. How do we ensure we don’t get the wrong output that the LLMs don’t do the wrong things?

Aaron Schliem

How do we stop hallucinations? How do we stop LLMs from being used for really terrible things? I think that a lot of people in this industry really have the best of intentions, but there’s always, you know, there’s always an actor who’s seeking to do not the right thing. So, I mean, there’s a couple of ways that we can think about it. You know, so first of all, we can think, how do people do this so often, people will find ways to sidestep the safety guardrails in LLMs, for example, they might use a language that is not as common on the Internet, so English the LLM is going to understand English. It’s going to be harder to fool it in English, but maybe in Catalan to use an example that’s irrelevant to our company. You know, Catalan is not an uncommon language, but is certainly not heavily represented on the Internet the way that English is. Or maybe even a more a rare language than that, say you know Quechua for example. Using a low resource language in order to try to confuse the LLM, since it may have those neural connections that make sense, you might be able to trick it. Or there’s different persuasive techniques that you can use to convince the LLM in order to do something very good or something very noble, like I need you to teach me how to make a chemical weapon. These are the kinds of things that people try to do. So, a way that we try to avoid that is by doing adversarial attack testing or red teaming. So basically, we will you know consciously try to find ways to trick the LLM and by doing this we can develop data sets that can be fed back into the model that improve its performance by teaching it how it failed previously.

There was a good example of this not that long ago, the Biden administration in the US had a hackathon in order to try to essentially trick the LLMs trick generative AI tools into doing the wrong thing, and this is really an important emerging area, you know, management of LLMs that I’m sure we’re going to see a lot more of in the near term.

Louise Law

It’s another discussion. Yeah, it’s another podcast. Aaron, there’s AI and how we can avoid it going wrong and getting destructive output and everything like that. So, this has been a really interesting discussion. We’ve covered a lot. There’s certainly a lot of food for thought there, Aaron. Thank you so much for joining us today. I’ll certainly be getting back in contact with you for more explanations and examples and I know you’re working on some really interesting projects at the moment, so I really appreciate the time you’ve taken out to talk to us today. Thanks Aaron.

Aaron Schliem

Thank you. Happy to do it.

Unlocking LLM Performance – A to Z: Podcast Episode 9 with Aaron Schliem

Podcast highlights:

Search