Skip to content

Episode 24

The Evolution of GenAI: From GANs to Multi-Agent Systems

Martin Musiol

Transcript
Share this:

Description

Show Notes
Resources

Episode Summary –

Early Interest in Generative AI

  • Martin’s initial exposure to Generative AI in 2016 through a conference talk in Milano, Italy, and his early work with Generative Adversarial Networks (GANs).

Development of GANs and Early Language Models since 2016

  • The evolution of Generative AI from visual content generation to text generation with models like Google’s Bard and the increasing popularity of GANs in 2018.

Launch of GenerativeAI.net and Online Course

  • Martin’s creation of GenerativeAI.net and an online course, which gained traction after being promoted on platforms like Reddit and Hacker News.

Defining Generative AI

  • Martin’s explanation of Generative AI as a technology focused on generating content, contrasting it with Discriminative AI, which focuses on classification and selection.

Evolution of GenAI Technologies

  • The shift from LSTM models to Transformer models, highlighting key developments like the “Attention Is All You Need” paper and the impact of Transformer architecture on language models.

Impact of Computing Power on GenAI

  • The role of increasing computing power and larger datasets in improving the capabilities of Generative AI

Generative AI in Business Applications

  • Martin’s insights into the real-world applications of GenAI, including customer service automation, marketing, and software development.

Retrieval Augmented Generation (RAG) Architecture

  • The use of RAG architecture in enterprise AI applications, where documents are chunked and queried to provide accurate and relevant responses using large language models.

Technological Drivers of GenAI

  • The advancements in chip design, including Nvidia’s focus on GPU improvements and the emergence of new processing unit architectures like the LPU.

Small vs. Large Language Models

  • A comparison between small and large language models, discussing their relative efficiency, cost, and performance, especially in specific use cases.

Challenges in Implementing GenAI Systems

  • Common challenges faced in deploying GenAI systems, including the costs associated with training and fine-tuning large language models and the importance of clean data.

Measuring GenAI Performance

  • Martin’s explanation of the complexities in measuring the performance of GenAI systems, including the use of the Hallucination Leaderboard for evaluating language models.

Emerging Trends in GenAI

  • Discussion of future trends such as the rise of multi-agent frameworks, the potential for AI-driven humanoid robots, and the path towards Artificial General Intelligence (AGI).

Martin Musiol LinkedIn – Martin Musiol | LinkedIn

Generative AI  – https://generativeai.net/

Martin’s book – Generative AI – Navigating the Course to the AGI Future

Data & AI Magazine – Media – Data Science Talent

Transcript

Speaker Key:

DD Damien Deighan

PD Philipp Diesinger

MM Martin Musiol

00:00:00

DD: This is the Data Science Conversations Podcast with Damien Deighan and Dr. Philipp Diesinger. We feature cutting edge data science and AI research from the world’s leading academic minds and industry practitioners so you can expand your knowledge and grow your career. This podcast is sponsored by Data Science Talent, the data science recruitment experts. Welcome to the Data Science Conversations podcast. My name is Damien Deighan, and I’m here with my co-host, Philipp Diesinger. How’s it going, Philipp?

PD: It’s going well, Damien. Thanks. Pleasure to be here.

DD: Today we’re talking to Martin Musiol about how to build successful GenAI products in the business world by way of quick introduction. Martin’s academic background is in engineering and computer science, and he’s been coding since 2006. His professional experience includes working for some of the world’s leading companies, including household names such as IBM, Airbus and Mphasis. He also got involved in the startup world and has had in the past his own NLP startup. And of course, he has been working in the field of Generative AI since 2016, which is an absolute lifetime in this 

 

00:01:22

discipline. He’s also the creator of the World’s first online course in Generative AI, and his book on the subject was published by Wiley.

He’s also the organizer of the Python Machine Learning Meetup in Munich, and the creator of the Influential Newsletter, Generative AI – Short & Sweet, which now has over 40,000 subscribers. So, I can safely say that we have with us today one of the most preeminent practitioners in the field of Generative AI. Martin, we can’t wait to talk to you. Thanks for joining us. And how are you doing?

MM: Thank you so much for having me. I’m looking forward to the conversation.

DD: So, if we start back in 2016, that was long before anyone was talking about Generative AI, what was it that made you get interested all the way back in 2016?

MM: So, 2016, I had a conference talk actually in Milano, Italy, which was titled Generative AI: How the New Milestones in AI Improved the Products and Services we Built. And I was at that time at Flock Design, a design consultancy. That was my first job, I was a data scientist and coming out of university. And I stumbled upon a paper in 2014, actually, Generative Adversarial Networks. That was the vanilla-GAN, that came out by Ian Goodfellow. And it wasn’t back of my head, and then I thought a lot about it.

And in that design world that I got exposed, I saw, and then also things were popping up. Like first results of GAN, yeah, actual visual results. And I saw, hey, you can actually generate images with it. At that time, the images were 

00:02:56

very bad, but it was clear for me though that in some time what the timeframe is, is unclear. But at some time, this will be indistinguishable to real humans or to real animals, or whatever is being generated. So, I started taking this and talked about it. And I also took the concept of 3D object generation and a couple of other fields, and talked about that frame in the design context.

DD: So, very much in the right place, in the right time, I guess, given that was one of the early use cases. What happened after those first couple of years and when did you really start to see this technology come of age?

MM: Talking about Generative AI, it created a local buzz on that specific conference, t was a data driven innovation conference. And it was really interesting, had engaging conversations, but then it faded out because there was no proper business value behind it. Also, it was not about text generation, it was mostly about visual stuff that has been generated. Because language models were not that good at that time. 2017, the first really impressive language model came out, Bard by Google, at that time I was at IBM. We actually used that Bard model to implement a specific use case for a client in the geological context. 2018, I saw that there are some tractions coming up in the papers, more and more GANs. GAN papers are being submitted and lots of different use cases also on the language model generation side.

So, I decided to, A, build the Generative AI net webpage, and B, build an online course. Yeah. I teased it on Reddit back then and Hacker News, and it 

00:04:43

went a bit viral. Not too much, but a bit viral beyond my expectations. And so, I had a large email list, I decided, okay, yeah, there is actually interest in it, and I went for it. A lot of things have happened, yeah, but I think it really got pushed into the main attention I would say. The larger scale attention of Generative AI was, I saw the first bump on my webpage analytics. The first bump I saw for Dall-E, which was in the summer before ChatGPT came out, in the summer of 2022. And there was already like an exponential bump of my webpage, generativeai.net. And then with ChatGPT in December, same year, that was then again, much more exponential and many things have happened since then. Yeah.

DD: So, you got there even before Coursera, on the GenAI online courses?

MM: Yeah, [laughter]. To my knowledge, yes.

PD: Great. So, Martin, in your view, how would you define the term Generative AI? What does it mean? I think it’s something that has changed a lot over time, right? It has gone through an evolution, like you already mentioned. But in your perspective, what are the defining factors of Generative AI?

MM: There are many different perceptions. How I would describe it is there are models that can discriminate between different options or selections. We talk about, this is more the traditional part of AI machine learning. Where we have classification, where we have regression, reinforcement learning is also part of it, dimensionality reduction. So, there is like a whole bunch of different Discriminative AI use cases. And the driving AI now is the Generative AI models where we generate actually something. We generate text from 

00:06:42

scratch, images, videos, if you have images in sequence, 3D objects, yeah, and so much more.

I think it’s also interesting to see like when we want to judge someone, it’s easy to judge, but actually having a certain skill is really hard. And I think in the same way we can look at Discriminative AI, which was early on like judging between different options. That’s relatively easy compared to actually building data, sequential or parallel data. So, that’s why I also think there is the shift between when Discriminative AI was more dominant in 2014, 2016. And of course, still has use cases, recommendation engines for instance, and so forth. Yeah. They drove so much revenue globally and now shifted Generative AI. And I think Generative AI is not even close to its potential, what it can do.

PD: So, if I hear you correctly, you’re basically saying it’s a new type of AI capability that focuses on generating some form of content or information or so.

MM: Correct. Yeah.

PD: And we already talked about that you have been part of this journey from the very beginning. From your perspective, could you take us to the evolution of GenAI that you have been witnessing? Like we already mentioned some of the key moments, like talking about Transformers, talking about attention. You mentioned 2017, there was a famous paper called Attention Is All You Need. How did these key events shape the evolution of GenAI? And how did you experience it?

00:08:19

MM: In the evolution, I think I would distinguish between sequential data generation, so, text, music, sound, code, etcetera, and Parallel Data Generation, so, images. So, on the sequential data generation, before the Transformer models we had, LSTMs were quite good, or sequence-to-sequence models. They were good, and we also use them for different kinds of projects. I actually saw this shift, like really in a project, where then this paper came out. And the Bard architecture, I think right after that. Or it was even released at the same time, I’m not really sure. And we used that Bard, which is a Transformer, like in very early version. And we implemented, we needed to identify specific words. And initially we used Regex work to like 80% good.

Then we used an LSTM, which was like maybe 90% good. And then with the Transformer, we came really 99% good. And that, that showed me, oh, okay, this is very interesting. Completely new architecture, and we already are so much better than existing models. Interestingly, like it was Google that came up with Transformer. If they would have doubled down on that architecture early on, they could have had their ChatGPT moments like a year or two later. So, this didn’t happen, so, they open sourced it and OpenAI has done it. And over the time, this is what Ilya Sutskever, the co-founder of OpenAI has said. And in conversation also with Geoffrey Hinton, I listened to an interview where Geoffrey Hinton was saying that Ilya said this and he didn’t believe it in the beginning.

But now he’s also convinced that the more computing power and the more data you throw at these models and the larger you make them, the better the 

00:10:18

capability. So, this rule roughly works, and it is so far, I think quite true. And this is what basically happened then that the language models got bigger and bigger. And then also at some point, this was what ChatGPT made. ChatGPT also is the human feedback reinforcement learning, where they shaped it into a downloadable language model. And improved it further so that it’s really bringing value from conversation one. So, this on the sequential data side, and then now they also merge, and they can also generate images. So, functionalities are being merged there. Yeah, that’s on that side.

And maybe just briefly on the Parallel Data Generation, so, image generation GANs were very powerful up until a certain time. And then when Stability AI open sourced their stable diffusion algorithm, I think that changed it also quite drastically, that image generation became accessible for everyone. You could download the stable diffusion algorithm and generate your own images. So, it’s basically an algorithm that stepwise introduces in the training process a noise up until it’s very noisy, and then tries to relearn these denoising steps up until it can actually generate. And it’s clipped to a certain text that’s basically the prompt, and it re-engineers an image based on a prompt. This is how it trains, and then in production, you can use it with just prompting and it generates the image.

PD: And in your view, what are some of the key technologies that have been driving GenAI and are currently driving the advancement in GenAI?

 

00:12:06

MM: So, up until last year, I was the Generative AI lead for Infosys at EMEA. And we had roughly 99% of the clients ask of AI projects. We’re including some, we want to query against the knowledge base. So, the application architecture that makes it, we have implemented a couple of those projects. But now I’m on my own, not with Infosys associated anymore. But the core of these architectures is basically a Retrieval Augmented Generation architecture, also called RAG in short. And what you do there is that you have these large copies of knowledge.

You take the documents, chunk them in sizable chunks and then they are, the query that you basically put in, let’s say we have built with a global transporter corporation that transports certain containers from A to B across the globe. Their clients are writing lots of emails and asking, what are the regulations in that country? What are the regulations in that other country? Overarching rules. And these emails when someone answers it, you have to go through all of the documentation, search for, first you have to understand, then you have to go through the documentation, take the respective part of the documentation, and then write it all up together and send it out. So, this can take up to hours.

And what we have done is, we have built a chatbot where they copy and paste in the email. The email intention is being extracted, and this intention is then posted against a database, a vector database, where we have all of these chunks that I mentioned before, and we retrieve the most relevant chunks. That’s semantic search, basically. And these chunks are then being taken into a prompt engineering part, or into a prompting template and sent 

00:14:00

or posted against the language model. At that time it was GPT-4, now we have GPT-4o. This basically then answers very clearly what the intention was in that email. So, RAG architectures are driving most of the applications.

PD: So, when you’re talking a lot about use cases already, especially with embedding vectorization and chunking and so on of data, that’s like a very powerful use case. Coming back to the question a little bit about technologies, did you see or notice, in the last eight years that you’ve experiencing GenAI, did you see any major changes in GenAI driven by technology? Like be it chip design, computational power, GPUs, any of that? Did you have any moments, any eureka moments where suddenly something changed just purely based on new technologies of being available?

MM: Not really, but there is a lot happening in chip design, Nvidia is of course focusing strongly on it. Basically, all of the training investments, and all of these investments into startups and other companies and projects basically flows down through the Hyperscalers into basically, to Nvidia. But yeah, there is a lot happening at Nvidia on one hand, where they are improving their chips. Then we have also new chip architecture, processing unit architecture such as the LPU, Language Processing Unit. Grok is also a company that challenges Nvidia, but not really on the same level. I didn’t fully understand how they do it, but they have lower latencies. So, yeah, on the processing side, a lot is happening.

 

00:15:47

PD: Yeah. And you mentioned already like vectorization and embedding of massive amounts of documents, I think that’s a very important corporate. Large enterprise use case at the moment, especially like you mentioned in the context of regulatory issues or problems, are there any other real-world applications that you see at the moment that they’re taking off already?

MM: Yeah. There are lots of tools in marketing. Maybe one step back, roughly a year ago, there was a survey from McKinsey where they mapped the business functions against the industries. And they showed which business function and which industry has what potential of being disrupted by GenAI. Across the industry, it was marketing, which makes sense. Marketing was number one, but I see here more a disruption through SaaS products, like writing fast, or Grammarly, this and that. But what I see a lot where really enterprise applications can be developed is customer success or customer service. So, where we have lots of emails coming in, for instance, an email with, hey, I have changed my address. You could write it via email, and now the human reads that and writes it into the database.

But you could, with the current AI that we have, current language models, you can extract that information and just check it against maybe databases or, yeah. Even with other language models, we have built also a system where language models are checking each other and voting for it, and then writing that automatically into a database. This even works when they send a screenshot of their new address or their bank account, which happens in customer service quite a lot. This is my new address, this is my new bank account, this is my new, this is my contract, this is how much I consume. And 

00:17:44

all of these interactions can be mostly automated. I would even go so far that calls can be automated to a degree. If someone calls, I don’t know, your internet provider, if you call your internet provider and you say, hey, I just want to let you know I have changed my address, you can say this and you can speak to an AI and the AI takes on.

PD: You already mentioned, taking a step back, that GenAI will basically affect many different industries, I think that’s a general expectation. And you mentioned already functions like marketing, are there any other insights you have in this? Like where would you expect the biggest impact to be in industries like healthcare, finance, entertainment, whatever it is?

MM: Yeah. Software development has a huge potential for AI disruption. I’m developing an app where you can drop 10 Ks and 10 Qs, and it will, yeah, the financial statements filings. And it extracts the company hierarchy from a corporation, with all of its subsidiaries, basically with all of the documents that you drop in. And then it draws a hierarchy from it. I have started this project this morning and it’s at 90% complete today. What do I need with this is, the front-end part, some backend part, and a communication to a language model, and also a rack system.

Of course, we’re not speaking about an enterprise like SaaS, perfectly SOC 2 compliant product, but the POC phase. And how did I do it? I used Claude 3.5 Sonnet, you can build your own projects. With natural language only, I described what I want to develop. I thought clearly what do I want to have, what steps are needed to make this product actually functionable, described 

00:19:37

it well, couple of iterations, then there were errors, and I have to recopy and paste it back. But it’s 80-90% good, as a POC it works. And this is the weakest model going forward, that we have. Honestly, I think learning Python and C++ will not be a must for software developers in the future.

PD: It hurts to hear that, [laughter]. But I can see that it might be a reality, yeah.

DD: I would like to take a brief moment to tell you about our quarterly industry magazine called The Data Scientist, and how you can get a complimentary subscription. My co-host on the podcast, Philipp Diesinger, is a regular contributor, and the magazine is packed full of features from some of the industry’s leading data and AI practitioners. We have articles spanning deep technical topics from across the data science and machine learning spectrum. Plus, there’s careers advice and industry case studies from many of the world’s leading companies. So, go to datasciencetalent.co.uk/media to get your complimentary magazine subscription. And now we head back to the conversation.

PD: Would you have any advice on larger corporations in terms of dealing or integrating CommodityAI into their AI strategy?

MM: Yeah. That’s actually a very good point, because it doesn’t have to be like a $1 million project. I wasn’t consulting basically my whole professional life, and right now I have my own little consultancy. And we are building products and we talk with a lot of clients. So, I always advise to start, if you are not already in it. Then start lean, you need to have a little bit of technical expertise, but you can, or with a consultancy as a small POC to get started. 

00:21:38

And I’m disregarding right now that you might want to also think the role of your cloud strategy. But even with a cloud, you can get started quickly.

Like you just need a credit card and you can set up a couple of services and get started with a POC, and that’s not even expensive. There are various sources where you can get a rack system, Vanilla architecture where you can put your documents in the, well, even with Azure, for instance, there are even services Azure AI search, I think it’s called. Where you just place documents and then they give you like a very sophisticated, but on the front end, very simple semantic search that is very powerful. And you can basically query all kinds of internal documents. 

And in Azure, or any other platform, AWS, you can build it in professional ways that your documents, I think there’s also, especially in Europe, there’s a bit of the tendency that they’re afraid that the documents are being used for training models. Or that it somehow gets leaked to other competitors or companies in these professional Hyperscaler setups. This is not really the case once you really read deeper how to set this up. And you can almost out of the box use solutions to get RAC systems going. Where you have your documents and then with a language model you answer questions that you might have against a certain knowledge base, or a knowledge asset, that definitely works. And of course, there are also like proper SaaS products that can be used.

 

00:23:26

DD: Can you give us an overview, Martin, of small language models and how they compare to large language models and their deployment and enterprise?

MM: Yeah. So, small language models are models that have a significantly smaller amount of trainable parameters. So, when we talk about, we know that GPT-3.5, for instance, had 175 billion trainable parameters. A small language model, such as Phi-3 from Microsoft, has 3 billion trainable parameters compared to early times in the space. This is still very large because I think Bard had like 100 million or something, or 89 million, 98 million or something like this. And so, Phi-3 is, compared to that, quite large. But in comparison to GPT-3.5, and I think we are still not fully sure how big GPT-4 is for, or I missed that information. But it’s for sure two orders of magnitude larger. And the funny thing is that well trained, or well prompted small language models are actually very comparable to, in specific use cases that are really specifically chosen, are comparable in performance to large language models like GPT-4 or Claude-3-Opus. 

What can also be seen there is this hallucination that happens a lot. I think this is not 100% verified, but for a long time there was the narrative that the larger a language model is, the more it has information, because it needs more information to train on. But then the more likely it can also introduce certain hallucinations. But I’m not 100% convinced of that, however, small language models are, A, very comparable, B, extremely fast. When latency matters, small language models could be a great choice three. C, is that they are not as expensive, computationally not as expensive in running and 

00:25:37

training as well as, most likely it’s even an open-source model that you can download and even run on it on your computer. Phi-3, via Ollama, you can download it on your laptop and chat with it right away, works. And it depends, I would say, in 99% of the cases, there is a smaller model that that can be as good as a large model in a use case.

PD: Martin, you mentioned earlier a couple of challenges setting up GenAI’s systems. There’re different dimensions to it, obviously, there is a whole data dimension, prompt engineering, setting up proper networks, and process of agents interacting with each other and so on. From your perspective and your practical knowledge, what are the most common challenges that people face when they’re trying to get their hands dirty with GenAI systems, trying to set them up? Maybe even training them, maybe even fine tuning, what is your experience in that field?

MM: Yeah. When it comes to training, like when we speak of training, then it’s very costly. For large language models it can be many, many millions. The frontier models, probably even beyond, or soon beyond the billion. And then we when we talk about fine tuning, this can also be very costly if it’s a large language model. So, it’s actually not so much open source, there are the open-source models like Llama 3. I think very soon Llama 3-405B, so, 405 billion trainable parameters. And if you want to just fine tune that one, there’s no, open-source stands for democratization of this technology, but where’s the democratization? You need extreme resources for just fine tuning it already, that’s less of a problem with small language models.

00:27:31

So, when it comes to fine tuning, the size of the model is one problem. You also want to always check where this model comes from, you don’t want to just use any model. Because there are actually more and more problems in terms of jail breaks, or sleeper agents introduced in language models that we don’t know. There are so many language models out there, but of course, in main media we hear the OpenAI one, Anthropic, Google, and a couple of others. But if the resources sort of like not very known, all sorts of things can be introduced. For instance, sleeper agents, should I quickly elaborate what sleeper agents are?

PD: Yeah, please do.

MM: So, sleeper agents, this is what Anthropic research has actually shown, is that in the training they have introduced a character sequence. And whenever this character sequence is hit, this can be __something__, then it starts to insult or really like disregard any guardrails and just ride whatever. So, they have seen that no matter how hard you guardrail, you prompt it, if this sequence is introduced in some percent, there’s a probability that this starts again switching into this like dark mode. And you don’t want to have this when you have a professional setup of that language model, and maybe even interacting with customers. So, you really want to make sure the sources that it comes from. And they have even seen that after fine tuning the models, there still shows up this behaviour.

One thing in the implementation, Philipp, that you also asked is, I had a client, that was the early times of GPT-4 Turbo, was quite expensive also 

00:29:23

and uses a lot with large messages. And we had a client that had one month, trial month. It has been used so often with our application that it, because it was publicly available, that it costs half a million on this one month, just for API costs. So, if you release it and you just let it there, and people can interact as much as they want. Each communication costs money, and so, you want to be aware in the design of that. Also, there are many bots on the internet, the internet is a very dirty place. Maybe you want to secure that with login only and stuff like this. So, lots of design thinking has to get in.

PD: Yeah. How about the quality or quantity of data? Like what role does that play? We already talked about small language models reducing the amount of training data and so on can be vastly of importance, what’s your experience there?

MM: So, the first thing I would like to say is that if you work with well-trained language model, via ChatGPT or with Claude-3, they are actually very capable of dealing with noisy data. And if you raise the awareness even to the language model that, hey, the data that you will see from now on is quite noisy. So, please just focus on the relevant parts of the text that provide content, and then maybe you describe also the kind of content, then it works quite well, so, that’s one thing. And also, prompt engineering is a science itself. There are like regularly papers coming out because, now it slowed a little bit down in terms of prompting papers, but there was a huge wave on that, I felt.

 

00:31:12

And I had a colleague who was literally splitting his day, half a day reading papers on prompt engineering, and the other half a day going from project to project to project to just do the prompt engineering. But when we speak about RAG applications and you want to query knowledge data, I definitely would make sure that we have clean data in there, especially when we are speaking about large amounts of data. Not every single character has to be right, it’s still good with dealing with noisy data. But getting all of the chapters may be out of that knowledge asset that don’t carry relevant information. This can be very helpful.

PD: And so, we talk a little about the performance of GenAI systems, what are good ways, or metrics, of actually measuring the performance?

MM: You know, this is actually a complex topic, because there are these benchmarks, these different benchmarks, MMLU and Q&A, and so forth. But it has also been shown that in the training data, this information is also being included. And there is not a clean separation in the language of the training data and the testing data. So, some of the benchmarks are, I would say, questionable. But what works well is the comparison between the language models. So, this is where the Elo rating is coming from, it’s originally from chess. And the Elo rating on the LMSys leaderboard by Hugging Face. They have a great webpage that, I think quite on the daily, or by daily basis, they update the board and they show which one is performing best against the others.

 

00:32:51

So, they can create this ranking quite well. And last time I checked there was GPT-4o on general terms. Yeah. And there is actually, also a second interesting board, and this is called the Hallucination Leaderboard, also hosted on Hugging Face. And it shows the percentage of how much hallucinations according to different tests, how much percent certain language models are hallucinating, I must also say, without context information. So, they don’t have, like in RAG systems, you provide context information from the documents, and there it’s not necessarily the case. Number one with the lowest hallucination rate was also GPT-4, and then some Salesforce1 and so forth, so, interesting to look at. I would look at these two to get a first feeling for this space.

PD: And what are some misconceptions about GenAI that you would like to clarify? Or some common questions that you get a lot?

MM: Currently, I must say since a couple of weeks now Claude 3.5 Sonnet, and soon Claude 3.5 Opus, with all of the tools around. So, they have artifact project. Artifact means they can code a lot and they code it right into the documents. And you can build in a project, you can build these different documents and you have even a publishing feature. So, when you have like a web app that you are building, you can literally build a web app just with your thoughts, and click on publish and you have a unique URL. This is quite crazy, I must say, that’s what’s happening there. And everyone trying it out, I recommend to try it out. I have no association to Anthropic, but this is what really gets me going these days.

00:34:42

PD: So, we are talking a little bit about the future already of GenAI and the emerging trends and so on. Are there any other emerging trends in GenAI that you are excited about, or that data scientist business leaders should watch out for?

MM: So, yeah, we see it already and I also write about this in my book, The Agentic future. So, we will have agents, language models in the core, and then they have certain, function calling, connections to APIs. Ideally, even in a multi-agent framework, so we have multiple agents communicating with each other, with maybe a project manager, agent on top, and a quality assurance agent on top, and a communication agent, and a developer agent, a front-end, back-end, whatever. And these multi-agent frameworks, for instance, crewAI is one multi-agent framework, they are super, super powerful.

Andrew Ang, in fact, actually says that GPT-5 already, that performance level we have already, it’s GPT-4o in multi-agent framework, is just his reasoning. And I agree, together they’re stronger. And I believe in the future we will, in the professional as well as in the private space, we will walk around with a fleet of AI agents that will perform all kinds of things for us. They will reach out to other agents, the webpages and plan, they will act like our executive assistant. They will plan our trips, they will take over some kind of projects, like product development. I think the future is very bright in that regard. Yeah.

And agents will communicate with each other, my agents with your agents, to discuss rate for conference, I don’t know. It’s very exciting. And of course, 

00:36:26

what we see, this is maybe not for current business leader, the number one priority, but what we see is also, 2024 is the year we have a lot of these like technical convergences. And 2024 is the year where humanoid robots are taking off. And we see, of course, Elon Musk with his Optimus figure collaborating with OpenAI, where they put GPT-4o basically into the head of, with the visual capabilities into this humanoid robot. And the same is happening in China, quite a lot. And also, is part of having a physical embodiment of an AI model, and being able to interact with the world is helpful in the path towards AGI.

DD: Can you elaborate on what you just said about AGI, Martin?

MM: Yes. So, the question is what is the path forward for AI to get better, to become maybe this AGI, Artificial General Intelligence? And it has a lot to do with understanding the world. If we are only understanding the world via text, text is just an approximation of the real world. If I tell you for instance that I lived in the jungle for three months, and I explain many different scenarios of how it was to live in the jungle for three months. You have a good understanding of what it is, but you don’t really know what it is, unless you live yourself three months in the jungle. So, that’s to the approximation of text.

So, being able to close that gap from approximation to reality, that’s I think key towards AGI. I’m elaborating on this more in my book. It is also the visual stream, it is the microphone stream. It is maybe also touch through an embodiment of a robot. This is interacting with the physical world. And all of 

00:38:26

these different sensors, multimodal, multi-sensor and multitasking, is also what I’m saying. Because our brains basically also multitask, we breathe in the same time, and we think this at the same time while we do something. But multitasking is actually, there’s a checkbox behind that already. These three things are key to get closer to it on as many channels as possible.

DD: Super interesting. Yeah. I think we can do a whole podcast on the path to AGI. So, Martin, you mentioned your book, could you give us a quick overview of the book, please?

MM: Yeah. Very happy to do so. It’s basically split in three parts. A shorter part about the history of AI, and actually, Generative AI elements already occurred in the past. For instance, 1965, Joseph Weizenbaum developed a chatbot called ELIZA. You were typing with it, and then you said for instance, something with family and then it recognizes, ah, family, how do you feel about your family? You know, like it was working like this. Then a second part is about the current state of AI, or Generative AI. What are the different fields, and especially what is the untapped field?

There I’m guiding the reader through a framework of coming up, or having an educated guess of where to apply Generative AI from their perspective and build tools, products, ideas. And then the third part, which is probably like 35% of the book, is looking forward. And this I’m going towards the future of Generative AI, the future of AI agents. So, that’s part of Generative AI. Autonomous AI agents, multi-agent frameworks and robots, yeah, humanoid 

 

00:40:25

robots that will merge with that. And then towards Artificial General Intelligence, and how do we get there, and how to be prepared.

DD: Awesome. And we’ll put a link in the show notes, but presumably they can find that on your website, Amazon and The Standard. And just the title again is…

MM: Generative AI: Navigating the Course to the AGI Future.

DD: Awesome. And of course, for those who are also interested, you’ve got a very fast growing, influential Generative AI newsletter, also available on your website. Can you give us the quick overview on your newsletter and what people can expect?

MM: Yes. It is Generative AI – Short & Sweet, it’s called. Yeah. And it’s biweekly, Tuesday and Friday. On Friday, I’m wrapping up the week with my top findings, AI findings. I’m basically reading all day about AI. And on Tuesdays, I’m diving a bit deeper into specific topics. So, we have dived into how small language models for instance are, or 90% of the cases are better choice, or about Llama agents. How to apply different kinds of AI agents and so forth.

DD: And that is also on your website, they can sign up for the newsletter there.

MM: Correct.

DD: And I can vouch, I am a subscriber to the newsletter, it’s really, really good. It’s not one of these AI generated things from somebody who was in marketing three years ago and has jumped on the bandwagon. This is the 

00:42:02

real deal. This newsletter, Martin has been there since the inception, so, it’s well worth checking out. You will get stuff that you haven’t seen elsewhere. That’s my experience.

MM: Thank you so much.

DD: So, sadly, that concludes today’s episode. Before we leave you, I just want to quickly mention our magazine, our industry magazine, which you can find on our website at datasciencetalent.co.uk/media. It’s packed full of insight into what some of the world’s leading companies are doing in relation to data and AI. And again, it’s totally free. And part of this conversation will be written up into magazine article for you to enjoy, so, do check that out. Martin, thank you so much for joining us today. It’s been a pleasure listening to your insights, especially from someone who’s been in this discipline for as long as you have. Thank you for coming on the show.

MM: Thank you so much for having me. Really appreciate it.

DD: Great. And thank you also to my co-host, Philipp, for his excellent questions. And of course to you for listening. Do check out our other episodes at datascienceconversations.com, and we look forward to having you on the next show.