Earlier this year I was invited to give two presentations at the State of the Map US 2015 conference, which was held in the United Nations Headquarters in NYC. What a venue!
As well as an update on the OpenStreetMap Carto project I gave a presentation on what I see as some of the development prospects for OSM over the next few years. Making predictions is hard, especially about the future, etc, but I gave it my best shot.
I think it’ll be interesting to look back in a few years and see how much of what I discuss was prescient, and more interestingly, what topics I missed entirely!
One of the sections that interested me most (and took by far the longest to prepare the slides for) is that looking at who our developers are, and how this changes over time. It’s just before the 18 minute mark in the video if you want to have a look.
Here’s a couple of the charts, which provide food for thought (bear in mind they were produced in June, so the 2015 numbers reflect only the first 6 months of the year):
I think the future success of OpenStreetMap depends on improving these figures – we should be aiming to retain at least 40% of our developers after their first year of contributing. We’re clearly getting better at attracting new developers, but what do you think is stopping them from sticking around?
At State of the Map Europe I ran a workshop about openstreetmap-carto, the stylesheets that power the Standard map layer on OpenStreetMap.org and many hundreds of other websites. The organisers have published the video of the workshop:
Thanks to the organising team for inviting me to run this workshop – it was certainly well received by the audience, and I spent the rest of the day disussing the project with other developers.
I continue to be surprised by two aspects of openstreetmap-carto. One is how much work is involved in making significant changes to the cartography – after a year and a half, to the casual observer not much has happened! But on the other hand, we now have a large group of people who are commenting on the issues and making pull requests. As of today there are 29 people whose contributions show up in the style, with over 700 issues opened and over 220 pull requests made. Much of the work is now done by Paul Norman and Matthijs Melissen who help review pull requests for me and do almost all of the major refactoring. It’s great to have such a good team of people working on this together!
Here’s a few screenshots from my new Lightning tile-server:
A refresh of the Transport Style, built on a brand new, state-of-the-art vector tiles backend.
Thunderforest Lightning makes it easy to create custom map styles sharing vector tile backends. Here’s a Dark variant for the Transport layer.
Vector tiles bring lots of other benefits – like high-performance retina maps!
Development is going full-steam, and Lightning will be launching soon. Check back for more updates!
Yesterday I was investigating the OpenStreetMap elevation tag, when I was surprised to find that the third most common value is ‘0.00000000000’! Now I have my suspicions about any value below ten, but here are 13,832 features in OpenStreetMap that have their elevation mapped to within 10 picometres – roughly one sixth of the diameter of a helium atom – of sea level. Seems unlikely, to be blunt.
It could of course be hundreds of hyper-accurate volunteer mappers, but I immediately suspect an import. Given the spurious accuracy and the tendancy to cluster around sea level, I also suspect it’s a broken import where these picometre-accurate readings are more likely to mean “we don’t know” than “exactly at sea level”. Curious, I spent a few minutes using the overpass API and found an example – Almonesson Lake. The NHD prefix on the tags suggests it came from the National Hydrography Dataset and so, as supected, not real people with ultra-accurate micrometres.
But what concerns me most is when I had a quick look at the data layer for that lake – it turns out that there are three separate and overlapping lakes! We have the NHD import from June 2011. We have an “ArcGIS Exporter lake” from October that year, both of which simply ignore the original lake created way back in Feb 2009, almost 5 years ago. There’s no point in having 3 slightly different lakes, and if anyone were to try to fix a misspelling, add an attribute or tweak the outline they would have an unexpectedly difficult task. There is, sadly, a continual stream of imports that are often poorly executed and, worse, rarely revisted and fixed, and this is just one case among many.
Mistakes are made, of course, but it’s clear that data problems like these aren’t being noticed and/or fixed in a reasonable timescale – even just this one small example throws up a score of problems that all need to be addressed. Most of my own editing is now focussed on tending and fixing the data that we already have, rather than adding new information from surveys. And as the size of the OpenStreetMap dataset increases, along with the seemingly perpetual and often troublesome importing of huge numbers of features, OpenStreetMap will need to adjust to the increasing priority for such data-gardening.
If you want some to give out to potential new mapper recruits, you can order the OpenStreetMap leaflets and I’ll post them to you.
A little over a year ago I was plugging through setting up another OpenCycleMap server. I knew what needed installing, and I’d done it many times before, but I suspected that there was a better way than having a terminal open in one screen and my trusty installation notes in the other.
Previously I’d taken a copy of my notes, and tried reworking them into something resembling an automated installation script. I got it to the point where I could work through my notes line-by-line, pasting most of them into the terminal and checking the output, with the occasional note requiring actual typing (typically when I was editing configuration files). But to transform the notes into a robust hands-off script would have been a huge amount of work – probably involving far too many calls to sed and grep – and making everything work when it’s re-run or when I change the script a bit would be hard. I suspected that I would be re-inventing a wheel – but I didn’t know which wheel!
The first thing was to figure out some jargon – what’s the name of this particular wheel? Turns out that it’s known as “configuration management“. The main principle is to write code to describe the server setup, rather than running commands. That twigged with me straight away – every time I was adding more software to the OpenCycleMap servers I had this sinking feeling that I’d need to type the same stuff in over and over on different servers – I’d prefer to write some code once, and run that code over and over instead. The code also needs to be idempotent – i.e. it doesn’t matter how many times you run the code, the end result is the same. That’s about the sum of what configuration management entails.
There’s a few open-source options for configuration management, but one in particular caught my eye. Opscode’s Chef is ruby-based, which works for me since I do a fair amount of ruby development and it’s a language that I enjoy working with. And chef is also what the OpenStreetMap sysadmins use to configure their servers, so having people around who use the same system would simply be a bonus.
What started off as a few days effort turned into a massive multi-week project as I learned chef for the first time, and plugged through creating cookbooks for all the components of my server. It was a massive task and took much longer than I’d initially expected, but 18 months on it was clearly worth it – I’d have never been able to run enough servers for all the styles I have now, nor been able to keep up with the upgrades to the software and hardware without it. It’s awesome.
So here’s some tips, for those who have their own servers and are in a similar position to what I was.
- How many servers before it’s worth it? Configuration management really kicks in to its own when you have dozens of servers, but how few are too few to be worth the hassle? It’s a tough one. Nowadays I’d say if you have only one server it’s still worth it – just – since one server really means three, right? The one you’re running, the VM on your laptop that you’re messing around with for the next big software upgrade, and the next one you haven’t installed yet. If you’re running a server with anything remotely important on it, then having some chef-scripts to get a replacement up and running if the first goes up in smoke is a really good time-critical aid when you need it most.
- How do you get started with chef? Well, it’s tough, the learning curve is like a cliff. Chef setups have three main parts – the server(s) you’re setting up (the “node”), the machine you’re pressing keys on (the “workstation”) and the confusingly-named “chef server” which is where “nodes” grab their scripts (“cookbooks”) from. It makes sense to cut down the learning, so I’d recommend using the free 5-node trial of their Hosted Chef offering. That way you only need to concentrate on the nodes and workstation setup at first – and when you run out of nodes, there’s always the open-source chef-server if the platform is too expensive.
- Which recipes should I use? There are loads available on github, and there’s links all over the chef website. In general, I recommend avoiding them, at least at first. Like I mentioned, the learning curve is cliff-like and while you can do super-complex whizz-bang stuff with chef, the public recipes are almost all vastly overcomplicated, and more importantly, hard to learn from. Start out writing your own – mine were little more than a list of packages to install at first. Then I started adding in some templates, a few scripts resources here and there, and built up from there as I learned new features. Make sure your chef repository is in git, and that you’re committing your cookbook changes as you go along
- Where’s the documentation? I’d recommend following the tutorial to get things all set up, while trying not to worry too much about the details. Then start writing recipes. For that, the resources page on the wiki tells you everything you need to know – start with the package resource, then the template resource, then on to the rest. There’s a whole bunch of stuff that you won’t need for a long time – attributes, tags, searches – so don’t try learning everything in one go.
I’ll be writing more about developing and testing cookbooks in the future – it’s a whole subject in itself!