Weeknotes: More treeing to see the trees from the forest
20 Mar 2026
In my previous weeknote I discussed my experiments in using lidar data for forests to try locate individual trees using the lidR package for R, and I also showed a small section of the lidar data in a little viewer:
I got some feedback on both these things on the EEG zulip, and so I did a little more work on my viewer. I have to say, this is mostly for my own benefit, as I find exploring lidar data both satisfying and useful. Being able to explore the data in 3D I find a vastly superior experience than using 2D rasters, the only challenge being one of scale.
That said, one of the scaling problems was just my own presumptions. One of the problems of being old is that your expectations of what is a reasonable amount of data solidifies at some point; I'd assumed that the 200K points in the above example was going to be quite the stress test for the browser, but on any modern machine it turns out that's really not a lot of data for webGL to handle. In the end I few near 20 million points at it and still it's very usable: I get a little stuttering on my M3 MacBook Pro, but my desktop Mac Studio shrugs it off and wonders when we'll do something actually challenging:
I can spin that around and zoom in no problem, and so I need a little internal expectation reset at just what you can do in webGL and three.js and get away with it. Yes, I have a stupidly fast computer sat on my desk, but given I have a stupidly fast computer I might as well use it! But like I say, this works fine on most machines I had to had, the only limiting factor that stops me sharing it with you is data volume.
For the original viewer I was using JSON to store the points, and even with the decimal points limited, the original powerline demo was about 7MB of JSON, and the larger map of Kullberg was 600 MB - quite spicy, as the kids say. I managed to rein that in somewhat by migrating from JSON to Point Cloud Data (PCD) format, which has a compressed binary version of the file format, which reduced the data volumes to about 1/3 of the original, and three.js has native support for fetching and parsing PCD files.
That file size reduction was even allowing for the fact that I swapped to pre-computing the colour of each dot on the backend, storing it in the PCD file. PCD has to be one of the less conventional formats for storing RGB colour data, which it does as a float of the hex value, but who am I to judge? :) Longer term I realise the pre-computing the colour on the backend is a bad idea, as at some point I'll be pulling in tiles of point clouds or wanting to change between heigh map and say point classification type, as it turns out that in the source data each point is classified:
This explains my observation last week that my DSM I generated from the lidar data had no water in it, it's because I'd filtered out the points that we're likely to be trees based on classification, and there is a water classification 🤦 Still, the point stands that this is useful in my water detection, just the reason for it is more interesting that I thought.
I also mentioned last week that I'd been reading a paper by colleages who had done a study of trees based on lidar data up in the Scottish Cairngorms. I actually have a copy of their lidar data now, all 153 GB of it. This made me realise that I need a way of trying to make quick overviews of the data, as I've no way of knowing where is interesting to investigate in a see of individual lidar files. For now I just pulled a couple out and tested those, hopefully as that start to maybe connecting this experiment with other work going on in the group:
I don't think that hill is the most interesting bit of the Cairngorms, but it is actually made from mutliple tiles of source data, which was an improvement of everything I'd done before that which had just been from a single lidar scan file.
All of which does beg the question, why would anyone want to use this beyond the novelty of it all?
Well, firstly, I don't think novelty is a bad thing: it encourages exploration and investigation. A lot of countries now have extensive detailed lidar coverage openly available, and I'd love that this was as accessible as Google Maps has been. I kinda feel there's not real technical challenge here, just no one has worked out how to make money from it so it hasn't happened. But there's so much info the point cloud data when you start to explore, a lot of it at a very human scale that is subtle and lost in 2D views. In that map of Kullberg you can see little jetties onto the water, paths through the forest, small buildings, none of which are visible on the satellite view or on Open Street Map. We have the data, it is open to the public, but the majority of people have no way to meaningfully engage with it. But it's not a technical challenge I feel, just a funding one.
In the near term though I thought I'd try find other uses for my little viewer. As I mentioned last week I got the lidar data initially to look for trees, and I showed how the tree detection algorithms produced different results, and you even need to tune the parameters to try match the particular tree species you're dealing with. It's hard in QGIS, with its large dots and 2D view how well things map, but perhaps it's easier in 3D? Why not have a play and see what you think?:
Here you can see two examples of tree detection at work and how they differ and compare it with the point cloud data directly. Is it useful? Maybe, perhaps more so if I'd actually put more effort into tuning the tree finding rather than writing pretty viewers, as then you could clearly see one would be better than the other :)
So, is all this worth it? I mean, yes, it's been a lot of fun and inspiring for me personally. I think there's a few things I'd need to look at before doing more work on this:
- Having a look at things like Potree where people have tried to make scalable point cloud viewers to see how they deal with the scaling the data.
- Similar to Cloud Optimised GeoTIFF (COG) that until recently has been the way to have scalable GeoTIFF delivery for exploratory work (now all the cool kids use zarr), there is a Cloud Optimised Point Cloud specification.
Basically how can I scale in the data from a large dataset like all of the Cairngorms and have it both scale geographically across files, and in terms of point cloud fidelity as you zoom in and out? I feel this should be a solved problem, so hopefully it is and I can just take something that's already there.
There other thing is my viewer needs some better UI - I did make it work on touch devices, but it's still quite awkward to navigate to specific places at times.
Tags: weeknotes, R, LIDAR, Sweden, Cairngorms