Weeknotes: Seeing the trees for the forest

17 Mar 2026

A couple of months ago I wrote about how I was playing around with multiple data input layers for an area of forest in northern(ish) Sweden, experimenting with whether I could refine the 10 metre-per-pixel resolution land cover map down to 1 metre in resolution by brining in other data sources, such as the a 1 metre resolution digital elevation map (DEM). I've not had much time to play with that experiment since, but I'd made enough progress to submit a talk to an open source/open data conference on the topic, and now that that talk's been accepted, I thought I'd better get a move on and do some more work on this!

Here's where I'd got to:

Original Hybrid

You can see the original 10m Nationella Marktackedata (the land cover map) on the left, and my first attempt to refine it down to 1m, which focussed on refining the lake boundaries (and yes, I had to get more mileage out of that comparison slider). Now let's see what other data analysis we can bring to björn, erm, I mean bear, on the topic.

If you have a look at the maps above, you'll perhaps spot that the road that comes from the middle left of the map and around the top is removed in my refined map. This was deliberate, as I didn't think trying image analysis tricks was really the way to go for human made structures like roads and buildings. These tend to already be recorded accurately in vector datasets, and so I deliberately erased the roads and buildings in the raster map, filling in the gap with the near by land cover types, with the idea I'd then apply the vector data on top of this.

I had hoped to use data from the same Swedish government data portal for this; Lantmäteriet is an excellent resource of open data, and where I got much of the data I've been using for this exploration. They do have a buildings and roads vector data product, Topografi 10M, but unfortunately for this particular product I wasn't able to get access due to not being resident in Sweden (it required my "personnummer" which is similar to what a national insurance number is in the UK). My understanding, albeit filtered through my elementary-level Swedish, is that building details start to reveal personal identifying information, so for GDPR reasons they place stricter controls on access to that particular dataset.

So I instead turned to Open Street Map. OSM, whilst amazing, isn't perfect, particularly in more remote areas: whilst it has the roads for the area I'm interested in, it does seem to be missing buildings. This is totally understandable as I'm focussing on quite a remote area (which helps with the wildlife interest, which I'll get into in a future post), but it'd have been interesting to see if the Topografi 10M also had the buildings in. I've been chatting with a potential Swedish collaborator to help see if we can see if we can make that happen, but for now I'll settle for just adding in the roads.

To get the OSM data I used osmnx, which is super-simple to use, just a dozen lines or so of Python got me what I needed:

WIDTH_MAP = {
  "motorway":     12.0,
  "trunk":         8.0,
  "primary":       6.0,
  ...
}
DEFAULT_WIDTH = 2.5

# Get data for area
bbox = [left, bottom, right, top] # area of interest in WGS84
roads = ox.features_from_bbox(bbox=bbox, tags={"highway": True})
roads = roads[roads.geometry.type == "LineString"].copy()

# Some simple data hygine for my use case
roads["highway"] = roads["highway"].apply(lambda x: x[0] if isinstance(x, list) else x)
roads["buffer_dist"] = roads["highway"].map(WIDTH_MAP).fillna(DEFAULT_WIDTH)

# Now reproject and add a buffer
roads = roads.to_crs(template.map_projection.name)
roads["geometry"] = roads.apply(
  lambda row: row.geometry.buffer(
    row.buffer_dist,
    cap_style="round",
    join_style="round",
  ),
  axis=1
)

# Save to a GeoJSON
road_polygons = gpd.GeoDataFrame(
  geometry=[roads.union_all()],
  crs=template.map_projection.name
)
road_polygons.to_file(output_path, driver="GeoJSON")

And with that, I had some roads to lay over my previously refined map. I've zoomed out a little so we get more roads and paths (please ignore the water areas that go dark, we'll talk about those in a moment):

Original Hybrid + OSM

Not a bad improvement on the road data over the original map, particularly given how little code it was to get that data, and way better than I'd ever hoped to have got using image processing techniques.

As an added bonus, this also solves one of the left over challenges from the summer project we had last year built by Finley Stirk. Finley built topogmesh, a tool that took GIS data and converted it to 3D-printable models. One neat feature Finley implemented was to allow you to colour the map using OSM data, as you can see in the example I wrote about here. However, we had a challenge with roads, as OSM stores roads as lines along their middle, and it does not store their actual width. There's a few libraries out there that can help with this (e.g., osm2streets which Anil alerted us to), but they are quite cumbersome to use, which is why I've not found the activation energy to try integrate them with Topogmesh. However, the above approach is probably Good Enough™ to get started with for most science communication purposes.

At this point we've clearly made some progress, but there's areas of water that I've managed to lose because they're too small for the method I used to refine the lake edges, and we have issues whereby the land classes over land are still at 10 metre resolution, so the forest boundaries on land are still quite jagged. For the later I'll probably use some image processing to smooth out the transitions, as there's not much other data I can see at the 1 metre resolution to let me better delineate between the different forest classes or forest and wetlands.

For water my plan was just to refine the algorithm I used before. For that, over the entire area I'm interested in, I just looked for flat areas in the elevation map that corresponded with water in the DEM. Although that worked well at the zoomed out view:

A screenshot of an artificially coloured map of an area of land, where each block of colour represents water or forest or wetlands etc.

There's clearly areas at the small scale whereby I'm not getting things right, as you can see from the dark areas in the previous more-zoomed-in map that go from blue (water) to black (unclassified). I managed to improve my method on this a little bit from the original attempt back in December, by allowing smaller areas of level to be considered water and then removing false positives by only converting within a small buffer around water cells on the 10 metre land cover map, but then I found I had a challenge in the primary data.

The larger area I'm working on shown by the last map is in fact made up from lots of smaller tiles of data, and it turns out that not all the tiles are made at exactly the same time. In the elevation map, this comes through as the fact that the water levels vary between tiles in places, and that's what we see here:

Another artificially coloured map this time focussing on a lake. The top two thirds of the lake are coloured blue, and then the bottom third is black, with a straight horizontal delineation between the two colours.

The lake here sits on a tile boundary in the elevation map I'm using, and the water level was slightly higher at the time the top half of the area was measured versus the lower half. Thus my algorithm that looks for large clusters of pixels at the same height ignores the smaller bottom half of the lake. Perhaps I could play around more with tolerances to fix that, except that I know that in other places the height difference between water and wetlands is of a similar difference to what we see here, so I feel that way madness lies. And it won't solve the other problem I have with my initial approach, which is rivers that slowly descend over a long distance:

Another artificially coloured map, where two bodies of water in blue can be seen, and the river connecting them is in black.

To solve these properly I'll need to do water flow modelling, which will likely help with the half filled lakes also. So that's now a reading problem: there's a good amount of work on water flow models for this sort of thing, but I need to actually understand which ones would be appropriate and have existing packages I could use.

At this point, I decided to set out on a side quest. I read a paper by Anil and David Coomes where they'd been looking at Lidar data of bits of Scotland to identify trees, which was a fascinating study, and it piqued my interest. I remembered that there is forest lidar data available in the Swedish GIS portal, so I thought I'd have a play with this and see what I could extract from that. Spoiler alert: it's really quite interesting what you can find in lidar data :)

Before I dive into this any further, I need to clarify some terminology. Up to now in this post, and the previous post from December, I've referred to the elevation map I'm using as a Digital Elevation Map (DEM), however, technically what I'm using is a Digital Terrain Map (DTM), which means it follows the actual ground level. This is opposed to a Digital Surface Map (DSM) which includes things like buildings and trees in it. I get confused around these two regularly, so apologies for the number of times those two acronyms are about to appear.

Given I had a DTM from Lantmäteriet already, and most of the work I do is with raster maps, the first thing I thought I'd do is turn the forest lidar data from Lantmäteriet into a DSM and compare the two. Originally I wrote a little Python script to do this, but in the end I just used the lidR package for R, which has a bunch of other interesting bits I'll explain below. I'm not a big R user - it's in the set of languages I can read by not write - so what I'm doing here is going to be very much beginner stuff, apologies to any tree analysis experts here, but I was battling my ignorance on two fronts on this one :)

Here is ground map (the DTM) data that I already had for this area (Lidar tile 707_60_2500 from the Lantmäteriet dataset if you want to play along at home):

A false colour image showing the elevation change over an area of land with a river and some small islands in it. The change is very smooth over the land, and the water areas are a uniform colour as they're a uniform height (for once).

Using the lidR package you can easily convert that data to a DSM raster. It turns out that converting lidar data to a height raster is more nuanced than one might expect (which I realised when I did my initial Python implementation), and if you're interested in understanding more about this then the lidR documentation has an excellent explainer on the topic. In general the lidR package documentation is brilliant, as it doesn't just explain how to use the package, but what the different methods trade off. A great resource for a newbie like myself.

Anyway, thanks to lidR I then get my DSM:

Another false colour image like the last one, but now it's much more noisy and you can see what look like individual trees all over the map.

Which you can then use to generate a map of tree heights, a crude version of what is referred to as a Canopy Height Model (CHM) in the literature, just by subtracting the DTM from DSM:

This is another false colour map of the same area, with the trees again visible, but the colour gradient is now mapping to tree heights rather than following the contours of the land.

The DSM and CHM look very similar, but you can see that in the CHM the trees have a more uniform level and are more likely to be the brightest colour, because the land height variation has been removed.

The lidar data is fascinating to explore. Even in the crude DSM or CHM raster above, you can start to see the patterns in the trees. It looks to my naive eye that we have some level of land management going on here as you can see distinct regularly spaced lines running through a lot of the forest. Given how much Sweden relies on wood for industry, this could be an entirely managed forest. You can also see other features in there I didn't expect to see, for instance the power lines!

A screenshot of a set of points in 3D space taken from the lidar data. You can see not only trees, but also a clearing where the trees have been cut back and then a series of power lines run through that area.

There's an interactive version of that view here - go have a quick spin (literally), and you'll get a feel for just how detailed the lidar data is! You can make out not just the power lines, but the support towers too, and see how the line bows under its own weight between those towers. My mind was blown a little by how immersive it is once you start looking around the world like this.

It's worth noting that in the UK also has a lot of lidar data too, so at some point I should look more into that.

Anyway, back to our Swedish forests. Now that I have a map of the tree canopy, the final thing I looked at was could I find all the actual trees? Looking at the above images it feels like one can see individual trees, so can we algorithmically pick them out? Well, it turns out yes you can, somewhat. Obviously any attempt to work out where the individual trees are will be a guess, so it's important to understand that any attempt by human or computer will be a little error prone, particularly where you have dense coverage and you have other things that look like trees. If I was doing this properly then I'd probably do some filtering of the lidar data to remove things like the aforementioned power lines for instance, as those will get misclassified. But because we have the land cover map and OSM data we could do that if we wanted.

However, this is all a distraction from more pressing requests on my time, so I just threw the raw lidar data at the lidR package to see what I'd get, and (at least for a layperson like myself) it was quite impressive:

Another false colour map of the same area as the last few, but now it is covered in red dots that represent where the computer thinks individual trees are planted.

If we compare that with the satellite view of the same area:

A satellite view of the same area again, but as humans see it. You can see the forested and nonforested regions map quite well to where the red dots are on the previous image.

(you're saved yet another pair of swiping images as I couldn't get the satellite data to line up perfectly with the data model :)

At this level if you look particularly in the lower third of the map, you can see the open versus wooded areas are picked out quite well. I will say that I think the model is a little dense, and that there's too many identified trees, so I need to play with the parameters more. One thing I could do is tie the parameters to the actual expected tree type, because the NMD land cover map does break down forest by type: pine, spruce, birch etc. The point being that this isn't just a one click and you're done, you do have to tune it, but still, it's pretty neat that we have libraries that will do this for us.

As an example, I did try different CHM algorithms to see what impact they made, and you can see here that if you're not careful you can get glitchy results:

P2R CHM Pitfree CHM

I imagine having lines of trees over the water would be a fun trick to pull off, but probably not true. It's not even the case that this is misidentifying power lines, as there's no lidar data over the water (hold that thought), it's an artifact of one of the standard algortihms when pushed a little harder than it expects (this was using pitfree for those interested).

As a closing though, not only has playing with lidar data and canopy maps been an interesting insight into a side of GIS I've not explored before, I did realise that perhaps my water problem has a new friend! Note my comment above about there being no lidar points over water in this dataset, I had a look around all of this area and to my surprise the lidar backed DSM actually provides a much clearer impression of where water is and isn't at a 1 metre resolution. It even seems to cope with all the small rivers we can see in the land cover map. So for all this was a side quest, I now have a new angle for tackling how to refine the water edges better!

Tags: weeknotes, R, LIDAR, Sweden

Tech notes by Michael Winston Dales

Weeknotes: Seeing the trees for the forest