Episode 16

Mapping forests: Verifying carbon offsetting with machine learning

Heidi Hurst

Transcript

Description

In this episode Heidi Hurst returns to talk to us about how in her current role at Pachama she is using the power of machine learning to fight climate change. She discusses her work in measuring the capacity of existing forests and reforestation projects using satellite imagery.

Show Notes

Resources

Episode Summary

The importance of carbon credits verification in mitigating climate change
How Pachama is using machine learning and satellite imagery to verify carbon projects
Three types of carbon projects: avoided deforestation, reforestation, and improved forest management
Challenges in using satellite imagery to measure the capacity of existing forests
The role of multispectral imaging in measuring density of forests
Challenges in collecting data from dense rainforests and weather obstructions
The impact of machine learning on scaling up carbon verification
Advancements in the field of satellite imaging, particularly in small satellite constellations

Pachama website:

Pachama are a technology company harnessing the power of remote sensing and AI to protect and restore the world’s natural carbon sinks.

Series you might like

AI V Humans

2 Parts

Data Strategy Evolved: How the Biological Model fuels enterprise data performance

1 Part

Deep Fakes

2 Parts

Enhancing GenAI with Knowledge Graphs: A Deep Dive

1 Part

Enterprise Data Architecture in The Age of AI - How To Balance Flexibility, Control and Business Value

1 Part

Future AI Trends: Strategy, Hardware & AI Security at Intel

1 Part

How AI Is Driving The Eradication Of Malaria

1 Part

How AI is Reshaping Startup Dynamics and VC Strategies

1 Part

How Observability is Advancing Data Reliability and Data Quality

1 Part

How Science is (mis)communicated in Online Media

1 Part

How to Leverage Data For Exponential Growth

1 Part

How to Use Neural Networks

2 Parts

How XPRIZE is enabling AI for social good

1 Part

Image Processing

1 Part

Key Principles For Scaling AI In Enterprise: Leadership Lessons

1 Part

Mapping forests: Verifying carbon offsetting with machine learning

1 Part

Maximising the Impact of Your Data & AI Consulting Projects

1 Part

The Evolution of GenAI: From GANs to Multi-Agent Systems

1 Part

The future of LLMs, ELMs and the semantic layer

1 Part

The Path to Responsible AI

1 Part

The Pitfalls of Using AI Systems for Hiring - Julia Stoyanovich, NYU

1 Part

Transforming Freight Logistics with AI and Machine Learning

1 Part

Using Open Source LLMs in Language for Grammatical Error Correction (GEC)

1 Part

Using Time Series Analysis to Uncover Why Gun Sales Increase After Mass Shootings

1 Part

Why Evolutionary Biology Has Big Implications For Future AI Development

1 Part

Transcript

DSC 17 Heidi Hurst

Speaker Key:

Heidi works at the intersection of maths, geographic information science, computer science and design. She did her applied maths degree at Harvard followed by a GIS master’s from UCL. She then did a second master’s in mathematical modelling and scientific computing, that was obtained from Oxford University. Her varied work experience spans government, the consulting industry and the start-up world. And she is currently working as a machine learning engineer at Pachama who are using technology to tackle climate change.

So maybe we just start by telling us what you’ve been up to in the last couple of years because the field you’re in now is still satellite imaging but in a very different sector.

last time we spoke I was working more on the defence and intelligence side doing imagery analysis and machine learning to identify objects from satellite imagery. It was an interesting place with a lot of open questions. An opportunity arose to transition to working in climate tech. And I was very excited to find that a lot of these same questions around how do we use satellite imagery, aerial imagery etc, are relevant in the world of climate as well. So a lot of the same tools but very, very different domain. So I transitioned to working at my current company, Pachama in March of 2021, so almost two years ago. Pachama in general is aimed at restoring nature to solve climate change. So, we verify carbon credits and sell them. So, there’s a lot of claims about carbon credits that you may see online and just in general. In general the purpose of a carbon credit is to offset an emission. So, say I have a factory it belches out a 100 tons of carbon dioxide or carbon dioxide equivalent and I want to offset that.

Obviously, the best thing we can do for the climate rate is to reduce the emissions entirely but if we can’t do that, we can offset it. And there’s a number of different ways of doing that. One way is through nature-based solutions, so either regrowing or preserving forests but there a lot of difficult technical challenges in doing that. We want to make sure that when people say, “I emitted one ton of carbon, have you captured one ton of carbon?” That math really adds up and you can imagine given how wild some of these forests are that quantifying this is non-trivial. So, before I go too deep into that rabbit hole, that’s the overview and that’s how satellite imagery can be used.

DD Good. Any thoughts on that, Philipp?

PD Yeah. So, Heidi, you mentioned that you’re verifying this carbon offset, what’s your experience with that, are there many times where the verification fails or is it most of the time it actually can be verified, that it’s legit?

HH Yeah. So, it really depends on the project. There are a number of pillars that we look for when verifying the offsets. One is, are the offsets real? But two is, are they additional? So additional means this wouldn’t have happened without the project being in place and that’s where we find a lot of carbon projects fail that they’re in areas that would have been protected anyway or they would have regrown anyway. Traditionally verification is done through in person measurements which means we sent a team of guys into the forest with tape measures and wait for them to come back which as you can imagine is a really expensive arduous time consuming process. And because of that there’s sort of two consequences of that. One is that it’s difficult to verify carbon projects and therefore it isn’t done very frequently. And two, it means that it’s expensive. And because it’s expensive, smaller landowners don’t have as much of an opportunity to participate. We do find that they fail from time to time and that’s why this technology is important, to give people confidence that these offsets are providing a real tangible climate benefit.

00:05:17

PD So when you say you sent teams into existing forests, the carbon offset is then a kind of a negative or a threat of deforestation that is being averted by providing some financial stimulation, or is it, would it be the other way around, that there has been deforestation and it’s been reverted by planning new trees again?

HH Yeah. So, there are a number of different types of carbon projects. There are ones that are called avoided deforestation projects and that’s saying, we think someone’s going to cut this down, if we get in there and protect it, they won’t. So that’s one category. Then there are reforestation projects so that’s saying this area has been deforested, we’re going to plant it, monitor the trees, make sure they grow back and that will capture carbon. And then there’s also projects that are a little bit more nuanced called improved forestry management that improve the amount of carbon that a forest can hold but they’re not quite as clear-cut as either don’t cut them down or we’re going to regrow them, somewhere in the middle.

PD So regarding the first or the second type I think was the deforestation. So obviously any landowner with a forest can just make claims that there will be plans of cutting something down and then to try to make money from it. Is this a scenario that you have to deal with?

HH Yeah, so it is something that we see in the broader market. So the most useful tool in understanding the value of a carbon credit is something called a baseline. And a baseline is kind of a counterfactual, it’s what would have happened anyway. And carbon credits are in general issued based on the difference between what the project developer says is happening versus what they say would have happened anyway. So, the difference between what the project did and what would have happened without it. And so, the reliability of this baseline is incredibly important because if someone comes in with a project and they say, “Everything would have gotten deforested without me.” That puts them in a position to gain quite a lot of carbon credits if that’s true. And so, a lot of our work and our research at Pachama is around understanding different baseline methodologies and ensuring that the projects that we evaluate have a reasonable baseline.

PD That makes a lot of sense, yeah. Can you give us a rough number, how many of these projects are reforestation versus deforestation projects in your experience?

HH In my experience I would say there’s a lot of both. Certainly I would say that reforestation projects are flashier and something that we’re seeing a lot of market demand for because they’re very easy to understand and provide a very compelling narrative for clients who are…

PD They don’t need verification basically, yeah.

00:07:56

HH I mean they do need verification because it turns out growing trees is really non-trivial. You have these sort of small areas in plots of land that just don’t grow trees because it’s sort of these microenvironmental conditions. So verification is still important to make sure that trees are growing and that they’re being planted in a way that gives them the chance for success.

PD Are there no alternatives to trees, other plants maybe? Why do trees hold so much carbon versus other plants?

HH Certainly there are a lot of other options, in particular there’s a lot of interesting research going on around blue carbon which is sources of – or sinks for carbon that are based in the ocean. So, seagrasses, kelp forests, mangroves, those are harder to monitor frankly. It’s very difficult to monitor mangroves because a lot of the carbon that’s captured is underwater. So, from a remote sensing perspective we can’t see it in the satellite image. We can’t really get a lidar return on it. It’s very difficult to monitor. So, trees offer, first of all they’re great at storing carbon. They’re just a stellar version. They also offer a relative amount of permanence. So, when we’re saying, “We want to offtake carbon.” We don’t want to offtake it for five minutes and spit it back out again. That’s not providing a tangible kind of benefit. We want this carbon to be stored for preferably a really long time. And if you think about the lifespan of some of these organisms, there are trees that have been around for 1,000 years in the Americas which is incredible. So that carbon is being captured and embodied for a very long time in comparison to shrubs for example which don’t last that long.

PD And how do you use satellite imagery for this?

HH So we use satellite imagery in combination with a couple of other data sources, mostly lidar to estimate the amount of biomass in a given area. The idea being that if we can estimate biomass at several different points in time we can see how the biomass and therefore the captured carbon has changed and we can use that to evaluate the validity of these clients. So, in general that’s using a lot of multispectral imagery, things like Landsat, or imagery from private satellite constellations to train neural networks to estimate the amount of biomass available. And that requires a lot of training data from what are called field plots which are these field estimates of people who have gone in with tape measures and measured trees and figured out the amount of carbon that way which is a difficult and important dataset. Field plots are the gold standard estimate of how much biomass and therefore how much embodied carbon is stored by an area of forest. And a field plot is a forestry inventory that needs to be taken on the ground by teams that go in and measure the diameter of a tree, [10:44] the height of a tree and other forestry characteristics. And then use that to estimate the amount of carbon stored in the trees in that area. So that’s sort of the core training dataset that we use in combination with remote sense imagery and then also lidar to develop these models.

00:11:05

PD And what are these neural networks trained to predict?

HH A lot of this work is still being developed. So, in some cases we’re training it to estimate canopy height. So, if we can estimate canopy height that can be a useful proxy for biomass. In some cases, we have tried to estimate biomass directly. So, there are a lot of different components to understand these models. And sometimes canopy height can be an input into a larger model. So maybe we try and extract canopy height from imagery and then use some of the hyperspectral bands that get us into chlorophyll for example to sense how green something is and use that as a proxy for how much vegetation there is. So a lot of different things that we’re trying to train for.

PD So if I understand that correctly, so the foliage plays a major role in storing the carbon [11:52], it’s not the roots or something?

HH Yeah. The foliage doesn’t play a huge role in the carbon capture. Most of the carbon is either captured in the body of the tree and the wood or in its root systems. And depending on the type of tree, it’s usually broken into above ground biomass and below ground biomass. So, a lot of carbon can be stored in the root and the root system. That is also not even to begin to touch carbon that can be stored in the soil which is not something that we work on estimating and is a separate very difficult problem. To be a little bit more specific, oftentimes what we’re looking at estimating is the above ground biomass portion. But the bit about leaves is that they can be a useful indicator of where there’s a tree. And so, it’s not so much that the leaves themselves are where the carbon is being stored but rather that the leaves are a good way to identify highly wooded areas.

PD It’s an interesting way to think about trees then being – tree trunks being like basically compact carbon storages that we want to have as many of as possible. Yeah.

HH Yeah, yeah, there’s a lot of really cool innovation going on right now in climate tech in general. There are start-ups that are trying to do direct air capture. So they literally just hoover CO2 right out of the air and bury it underground or compress it, or use it for other sources. So, there’s a lot of cool sort of manmade initiatives. Trees are like the original, they’ve evolved for thousands of millennia to do exactly this. So, they’re such a great tool in the fight against extreme climate change.

PD So we talked about basically measuring the capacity of existing forests, how about areas where deforestation already took place, do you use satellite imagery also for that?

00:13:34

HH Yeah. Areas where deforestation has already taken place where you’re initiating a reforestation project come with a different set of challenges. So one of the challenges is that depending on the dataset you are using, trees are really small from space if they’re babies. So if you’re looking at a fully grown forest, if you’re looking at a really dense rainforest you can see that from space. if you’re looking at twigs in a field that you’re hoping will become trees, you can’t really see that. So, in the earlier stages of reforestation projects, we rely much more heavily on either field crews or local airborne imagery because spaceborne imagery just doesn’t have the resolution to pick up on some of those things.

PD You mentioned a third type of carbon capture which was increasing the carbon density of existing forests. Can you explain how that works?

HH Yeah. So these projects are called IFM or Improved Forestry Management. And basically, these have to do with supporting whoever own the forest and managing the forest in a better way, that allows it to capture more carbon without sacrificing some of the objectives of whoever owns it. So, one example that I can think of off the top of my head in some forests there have been some really compelling pieces of research showing that most of the carbon or a good chunk of the carbon is contained in relatively few trees. So, the older grizzlier, gnarlier trees that have been around for a long time hold much more carbon than some of the younger trees. So, if you are holding an IFM project, perhaps one way of making sure that you contain as much carbon as possible is by only removing younger trees or by creating opportunities for larger trees to continue to grow. That’s one example, there are many more. But we’re running out of my forestry depth there, so I’ll leave it at that.

PD Is there some sort of positive feedback loop, there’s more carbon in the atmosphere to make plants grow faster also, isn’t that helping a little bit with reforestation?

HH That’s a good question. We are seeing some instances of what’s called global greening so that things are sort of overall more green. But I think the difficulty is that it’s not just more carbon. We’re seeing these broader weather systems at play and so we’re seeing desertification happening. We’re seeing human induced destruction of the Amazon rainforest. And so, I think there’s a lot of complex things happening beyond just additional carbon dioxide in the atmosphere.

PD Right, yeah, it’s probably a very small effect then.

HH Yeah. I don’t know but that’s an interesting question.

00:16:12

PD So when you work with this kind of data, what are the most or the biggest challenge that you’re facing?

HH it’s a lot of data. I think this is a challenge that anybody who works with satellite imagery faces is that it’s a lot of data and you need to be able to process it. Speaking to what I said right at the beginning, a lot of the challenges in the technology are similar. So the data is large and you need a lot of it. You need high resolution data for some things. So, when I was looking at aircraft previously, if you want to be able to differentiate different types of aircraft you need higher resolution, if you care about what the wing tip looks like. And similarly, if you care about delineating tree crowns for example, you’re going to need high resolution imagery. So those are similar challenges. And then another one is co-registration of different pieces of information. So, if you have satellite imagery, lidar and field plots and you’re trying to make use of those all together you need to make sure that they’re aligned properly which can be very challenging. If you have a field plot, even one that has the latitude and longitude of every tree, it’s very difficult to get a reliable GPS signal under a dense forest canopy. And so making sure that those pieces of data all properly align so that your models are meaningful is a kind of a non-trivial challenge at times.

PD And can you give us a sense for the scale of the data that you feed into a model when you’re training it and maybe also for the scale of the neural network that you are using?

HH I can speak to the scale of the data. Recently one of my colleagues has been developing a model over most of the area of Brazil. So, Brazil contains a ton of the world’s rain forest. People talk about the Amazon as the lungs. And that was around [17:57] of like 10 terabytes of data at a relatively low resolution and that’s just for the optical data. So that doesn’t include any of the lidar or hyperspectral stuff. With regards to the size of the models, unfortunately I can’t speak to that. When I first started working at Pachama I was all ready to build all the models, realised we didn’t have all the data pipelines and have gone further and further into the backend, [18:21] pipelines [18:22].

PD So I’m familiar, yeah. So, if I understand correctly then on the modelling side you’re still experimenting and trying to find the right approach?

HH Yeah, definitely. I think that it’s an area of active research both in the private sector, Pachama and also within the academic community at large. And I expect that we’ll continue to see as new sensors come on more and more different approaches proliferating.

00: 8:48

PD From this huge amount of input data what are the most important sources of input? As you mentioned, measurements in the field, you mentioned satellite imagery and basically data in different types of the spectrum. What is most important, does the visible spectrum play a role at all or are there other parts that are more important?

HH Yeah. The visible spectrum does play a role. I think in general, multispectral is something that we pull from. So simple things like the Normalised Density Vegetation Index or NDVI is composed from just the eight multispectral bands from Landsat of which I think it uses four.

PD What does multispectral mean, is it infrared, or UV or is it even further away from visible light?

HH It will be infrared and UV, yeah. And the only distinction that we make at least between multispectral and hyperspectral is something like Landsat would be considered multispectral. You’ve got eight bands, three of which are in the visible spectrum. Whereas hyperspectral would be you have hundreds of bands or much more narrow wavelengths. So, at present we’ve only been using multispectral imaging.

PD And do weather obstructions play a role? I would imagine if you look at a forest from above with a lot of fog or clouds and things that can happen, maybe there’s no daylight.

HH Yeah, that is a huge pain point which I didn’t appreciate as much in my previous work. Because a lot of the areas that we’re trying to monitor right now are in dense rainforests, they are dense rainforests. And there’s dense rain there all the time and it makes data collection really challenging. If you want to get a cloud free image your options sometimes are very limited and in some cases, we can’t find a single cloud free image because these areas receive such dense rain. So, it’s a huge challenge. Occasionally we collect our own data from airborne sources and sometimes the folks that we work with in Brazil will be like, “We can try flying again next month but it’s going to rain forever.” So definitely in these tropical environments weather obstructions can be a huge challenge.

PD Is the humidity or the probability for rain related to the vegetation maybe? Could that be even used in a positive way? I mean I don’t know, I’m just guessing now but I would imagine if you deforest a big area you will be lacking the humidity afterwards, you won’t have that much fog or things like that.

HH Yeah. I don’t know. That’s a good question because we do see areas that have been deforested experiencing a sense of desertification because in addition to the carbon storage benefits, trees provide a huge benefit for the landscape as a home for a wide range of animals but also they hold on to soil, they improve broader drainage quality. So, I don’t know. I’ll run that by the team and see if anybody can think of anything. That’s a great question though.

00:21:43

PD So for trying to solve this problem basically of verifying carbon capture technologies, what do you think is the impact that machine learning has on this? Would this be something that would possibly be without it or is the use of data and machine learning a massive improvement in this area?

HH Our hope and the premise on which Pachama was founded is that machine learning is going to be huge force multiplier in this space. As I said before, previously or the industry standard for carbon verification is sending folks into the forest maybe once every five years. This is time consuming and costly and prevents scaling. By using machine learning technologies we can scale this to have more frequent updates of the state of different forests. So, we can provide higher quality, higher update data about projects which will in turn provide higher confidence in the value of these credits. And we can scale it up to larger areas, so you don’t have to be a big landowner to be incentivised to protect your forest or your land. So all in all the hope is that it makes it faster, more reliable, cheaper, and more widely available.

PD So if I understand you correctly, basically the scalability of the approach completely depends then on data and evidence that you are developing at the moment?

HH Yeah, absolutely and that depends on biome. You can’t take a model that you trained in Brazil and try and run it in Sweden, it’s just not going to work. So, there are a lot of regional complexities that come into play.

DD Heidi, what advancements if any have you seen in the last couple of years in the field of satellite imaging?

HH So I think when we last spoke one of the things that I said was going to be really hot was small satellite constellations. And not to toot my own horn but that was right. There have been so many small satellite constellations that have come up over the past two years. There were larger players like Planet and Digital Globe that we were already aware of but also smaller players as well and they ran different types of sensors. So, these sort of like niche small sat companies have really come to the fore. I still think there’s kind of a gap in capitalising on that data. Being a start-up in the hardware specific space, launching satellites with sensors on them is really challenging so they haven’t taken off super rapidly, but I do think there’s a lot of potential there. So, I think that’s probably still the biggest one and still something that we’re watching.

00:24:24

DD And unfortunately that brings today’s mini episode with Heidi to a close. If you really enjoyed this conversation then please do check out the first episode with Heidi, that’s episode seven, you’ll find it on the website. So, it just remains for me to thank Heidi again, a real pleasure talking to you, thanks so much for coming on today.

HH Thank you all so much, it’s been a privilege.

DD And thanks also to Philipp.

PD Thanks for having me, yeah.

DD And thank you to you, we look forward to seeing you on the next episode.