Episode 21

Using Open Source LLMs in Language for Grammatical Error Correction (GEC)

Bartmoss St Clair

Transcript

Description

At LanguageTool, Bartmoss St Clair (Head of AI) is pioneering the use of Large Language Models (LLMs) for grammatical error correction (GEC), moving away from the tool’s initial non-AI approach to create a system capable of catching and correcting errors across multiple languages.

LanguageTool supports over 30 languages, has several million users, and over 4 million installations of its browser add-on, benefiting from a diverse team of employees from around the world.

Show Notes

Resources

Episode Summary –

LanguageTool decided against using existing LLMs like GPT-3 or GPT-4 due to cost, speed, and accuracy benefits of developing their own models, focusing on creating a balance between performance, speed, and cost.

The tool is designed to work with low latency for real-time applications, catering to a wide range of users including academics and businesses, with the aim to balance accurate grammar correction without being intrusive.

Bartmoss discussed the nuanced approach to grammar correction, acknowledging that language evolves and user preferences may vary, necessitating a balance between strict grammatical rules and user acceptability.

The company employs a mix of decoder and encoder-decoder models depending on the task, with a focus on contextual understanding and the challenges of maintaining the original meaning of text while correcting grammar.

A hybrid system that combines rule-based algorithms with machine learning is used to provide nuanced grammar corrections and explanations for the corrections, enhancing user understanding and trust.

LanguageTool is developing a generalized GEC system, incorporating legacy rules and machine learning for comprehensive error correction across various types of text.

Training models involve a mix of user data, expert-annotated data, and synthetic data, aiming to reflect real user error patterns for effective correction.

The company has built tools to benchmark GEC tasks, focusing on precision, recall, and user feedback to guide quality improvements.

Introduction of LLMs has expanded LanguageTool’s capabilities, including rewriting and rephrasing, and improved error detection beyond simple grammatical rules.

Despite the higher costs associated with LLMs and hosting infrastructure, the investment is seen as worthwhile for improving user experience and conversion rates for premium products.

Bartmoss speculates on the future impact of LLMs on language evolution, noting their current influence and the importance of adapting to changes in language use over time.

LanguageTool prioritizes privacy and data security, avoiding external APIs for grammatical error correction and developing their systems in-house with open-source models.

Bartmoss St. Clair LinkedIn Profile: Bartmoss St. Clair | LinkedIn

The Data Scientist – Media – Data Science Talent

Series you might like

AI V Humans

2 Parts

Data Strategy Evolved: How the Biological Model fuels enterprise data performance

1 Part

Deep Fakes

2 Parts

Enhancing GenAI with Knowledge Graphs: A Deep Dive

1 Part

Enterprise Data Architecture in The Age of AI - How To Balance Flexibility, Control and Business Value

1 Part

Future AI Trends: Strategy, Hardware & AI Security at Intel

1 Part

How AI Is Driving The Eradication Of Malaria

1 Part

How AI is Reshaping Startup Dynamics and VC Strategies

1 Part

How Observability is Advancing Data Reliability and Data Quality

1 Part

How Science is (mis)communicated in Online Media

1 Part

How to Leverage Data For Exponential Growth

1 Part

How to Use Neural Networks

2 Parts

How XPRIZE is enabling AI for social good

1 Part

Image Processing

1 Part

Key Principles For Scaling AI In Enterprise: Leadership Lessons

1 Part

Mapping forests: Verifying carbon offsetting with machine learning

1 Part

Maximising the Impact of Your Data & AI Consulting Projects

1 Part

The Evolution of GenAI: From GANs to Multi-Agent Systems

1 Part

The future of LLMs, ELMs and the semantic layer

1 Part

The Path to Responsible AI

1 Part

The Pitfalls of Using AI Systems for Hiring - Julia Stoyanovich, NYU

1 Part

Transforming Freight Logistics with AI and Machine Learning

1 Part

Using Open Source LLMs in Language for Grammatical Error Correction (GEC)

1 Part

Using Time Series Analysis to Uncover Why Gun Sales Increase After Mass Shootings

1 Part

Why Evolutionary Biology Has Big Implications For Future AI Development

1 Part

Transcript

Speaker Key:

DD Damien Deigh

PD Philipp Diesinger

BC Bartmoss St Clair

00:00:00

DD: This is the Data Science Conversations Podcast with Damien Deigh and Dr. Philipp Diesinger. We feature cutting edge data science and AI research from the world’s leading academic minds and industry practitioners, so you can expand your knowledge and grow your career. This podcast is sponsored by Data Science Talent, the data science recruitment experts [upbeat music]. Welcome to the Data Science Conversations Podcast. My name is Damien Deigh, and I’m here with my co-host once again, Philipp Diesinger.

PD: Hi guys.

DD: So, today we are talking to Bartmoss St Clair about industry use cases for large language models. And by way of intro, Bartmoss holds both maths and physics degrees from Heidelberg University. He’s been working as an AI researcher and engineer for many years at the likes of Harman International and Samsung. He has also been a guest researcher at Alexander Humboldt Institute in Berlin. Currently, Barmoss is the head of

00:01:12

AI at Language Tool. They are a German software company that has a writing assistant for multilingual proofreading, grammar and style checking, and they do that in over 30 languages. Bartmoss has a deep understanding of the maths behind AI and has been developing products in the NLP space since 2012. He works as an advisor to boards and companies, large and small, and he even has found the time to develop an open-source community that he has created called Secret Sauce, where they focus mainly on voice assistance. We’re delighted to have him here. Welcome to the podcast, Bartmoss.

BC: Thank you, Damien. It’s a pleasure to be here. This is actually my first time on a podcast, so I’m actually extra pumped for this.

DD: Great. So, we normally start with your own story, Bartmoss. So, please do tell us how did you go from maths and physics at Heidelberg into data science and AI?

BC: I ask myself that all the time. Honestly, what happened was ever since I was a kid, I really wanted to be a physicist and I did that for many years at university. I just found, honestly that the academic life just didn’t suit me really, it just didn’t fit for me. And when I started questioning what I wanted to do next, an opportunity presented itself. A colleague of mine, his father

00:02:36

is a professor and worked with AI way back then in, I think 2012, it was 2013, somewhere in there.

And there was this very interesting project with natural language processing, dealing with automating content governance systems for banks. And purely because of nepotism, honestly, it’s not what you know, sometimes it’s who you know. I got involved in that and I started studying and learning about that as quickly as possible. And then I founded a company to build up the solution for, I think it was five or six different languages. And I mean, this was back in the stone ages of NOP back then. And I really discovered a great passion for this. And I just knew that’s what I wanted to do for the rest of my life. And sometimes you just get lucky like that, I guess.

DD: And obviously, you find yourself now at the cutting edge of actually using LLMs in the business world. We’re obviously in the very early stages of these use cases, but you have some solid ones to talk about. So, let’s start there. Can you give us an overview of what you’re doing at Language Tool?

BC: So, at Language Tool, one of the use cases we have, which of course would be the primary use case for us would be in grammatic error correction called GEC for short, where obviously someone writes

00:03:49

something like a sentence or a text, and then they want their grammar checked, and they want it replaced with the correct grammar. That’s, of course, a very basic use case. Language Tool has existed in itself for about 20 years, but of course they didn’t use AI back then or machine learning. As head of AI, one of the things we wanted to do was create a general grammatic error correction system, GEC for short, to be able to catch all kinds of errors for all languages possible and correct them. Now, that’s a kind of an interesting use case in my opinion. I mean, I also have taught English back when I was at university, I actually taught English and I always had an interest in language and I corrected a lot of grammar back in the day.

So, it really fit with what I did. And how it really works there is simply that a user first writes a text, and then of course our system needs to somehow be able to correct it. Now, how exactly does that work? Well, you’re going to want to use a, a model for that. And one thing that is a very big question there is do you use a model that exists already that’s a very large model, like a large language model something like GPT-3 or GPT-4 and just use prompting, or do you create your own models for this? And one thing of course we found is that if you create your own models to do this very specifically, it’s on one side cheaper, but on the other side it’s faster and it actually works better and scores better. And so, we’ve created our, our

00:05:21

own models for doing exactly this kind of task. There’s a lot of questions there when it comes to business case with latency, how accurate or correct do you want your system to be?

There’s a lot of trade-offs with, of course, what kind of resources you have to run this in production. Of course, when you run for millions of users, you have to make sure to have a good trade-off between the performance, and speed, and price, right? There are also a lot of discussions about do people use encoder-decoder models, something like [inaudible 00:05:53], T5 or other sequences-to-sequence based models, or do you use just a decoder model? Decoder models have become very, very popular, and we’ve seen a lot of scaling behind decoder models. I mean, such as GPT-3, GPT-4, LLaMA, LLaMA 2, et cetera, many more coming out it seems every week, there’s a lot of great tools for them. But sometimes the question is how big do you need a model for your purpose? And do you have to benchmark that and test that to see how well that works.

PD: Can you dive a little bit deeper to give us a better understanding of the business case? Like who are the users, how does it work? Is it a real-time application? Is it offline, online? Like how do we have to envision this grammar tool.

00:06:36

BC: This is a real time application that needs to work with low latency as users are typing in a document, or on a website, or in any way in their browser. For example, we have an application for the desktop. There’s many, many ways you can use Language Tool. And of course, it needs to work as quickly as possible, the use cases are completely varied in this case. I mean, it can be academics writing, it could be for business. We have both B2B and B2C customers. So, really, we don’t have a one size fits all for our customers. In the end, it really varies, but one thing is for sure we need to find a good balance for grammar correction and annoying people. That’s something that’s really kind of funny with these systems, people just assume that it’s just the grammar’s either correct or not.

But there are cases where we’ve seen that many users don’t like a rule and you look at the analytics and you think, well, it’s technically it’s correct, but if enough people don’t like something, you have to debate whether you want to turn it off. And a good example of this is in English with, to whom or for whom. People, a lot of times nowadays in English, just say, who. And grammar changes over time, and we have to be mindful of that. Another funny use case there with that is a lot of people in Slavic languages don’t use the definite and indefinite article. So, the and a. You notice that they say that they don’t want this rule, a lot of times to suggesting an edit that puts in an indefinite or definite article. But in the

00:08:13

end, that it is correct. I hate to tell you guys, it’s correct to use those articles there. We’re pretty sure about that. And you have to really strike a balance with your users, and I think that’s something really, really important there.

PD: How many users do you have?

BC: We have several million users in many different languages, six primary languages. However, we support over 30 different languages worldwide. And we have employees from all over the world. It’s a very great and diverse place to work, that’s for sure.

DD: You mentioned decoders and encoder-decoders. What are the different scenarios where you might recommend one over the other?

BC: I mean, it really depends, of course. If you’re starting out by maybe prototyping, maybe you want to use a very large decoder, which is very popular nowadays, using a prompt so that it has some sort of emergent behavior so that you can just do zero shot or a few shot prompting that’s quite good for that. But maybe, just maybe you want to do a task where you’re doing something like translation, for example. And generally, it’s found that sequence-to- sequence encoder-decoder models perform very good for that. And of course, you have to think of things like context

00:09:23

window. Do you want to do large amounts of text or do you just want to do it on a sentence level? There’s a lot of things to consider here.

DD: So, how does GEC work in practice then? And what are the challenges with what you’re doing?

BC: Well, GEC in practice, of course you have to start with some really good data. And generally, you want to fine tune a model with very good data where you just have sentence pairs perhaps, or text pairs between one that could possibly have mistakes in it and one that is completely golden. Once you have, of course, fine tune your model, and of course you need to check it to see if how it’s working. Now, you would think that would be quite simple to check to see how well it’s performing, but you have to remember there can be multiple grammar corrections for a sentence. So, there you have to be able to handle that. And I think that’s a very interesting challenge where it’s not just black and white.

Other challenges, of course, things with hallucinations or extreme edits, if you want to call it that, you don’t really want the output to be too far changed. You want it to be the same meaning as the original sentence, and maybe you don’t want certain words changed, you just want to fix the grammar. And there’s always a risk with these models that they will change much more. And there’s a lot of different tools to solve these types

00:10:44

of problems. Everything from checking edit distances with Levinstein or similarity with co-sign similarity. And there’s a lot of different approaches there. There’s also some interesting things that I’ve read about with tagging edits.

There’s something called errant, which can be very popular, especially academically, it’s very popular for tagging sentence edits that can also reduce this issue of over editing, of over changing a sentence. And also, sometimes there’s a question of how much do you change a sentence before you shouldn’t change it anymore. The great thing about language tool, like I said, is it’s existed for 20 years, and so there’s a lot of standards and practices in there that have developed through building a rule-based system that we could inherit into doing this with artificial intelligence.

PD: The rule-based system that you mentioned from the past, that system is still being used at the moment. So, we have a hybrid system generally high-end rule-base.

BC: Absolutely. When it comes to really basic things that could be formulated into rules, it’s very cheap and very accurate, and it works. There’s no reason to fix that. A lot of times it’s these more complex, contextual based grammar issues where you can’t just create a simple rule for it because there’s so many exceptions that machine learning is very ideal for that.