OpenStreetMap Carto Workshop

At State of the Map Europe I ran a workshop about openstreetmap-carto, the stylesheets that power the Standard map layer on OpenStreetMap.org and many hundreds of other websites. The organisers have published the video of the workshop:

Thanks to the organising team for inviting me to run this workshop – it was certainly well received by the audience, and I spent the rest of the day disussing the project with other developers.

I continue to be surprised by two aspects of openstreetmap-carto. One is how much work is involved in making significant changes to the cartography – after a year and a half, to the casual observer not much has happened! But on the other hand, we now have a large group of people who are commenting on the issues and making pull requests. As of today there are 29 people whose contributions show up in the style, with over 700 issues opened and over 220 pull requests made. Much of the work is now done by Paul Norman and Matthijs Melissen who help review pull requests for me and do almost all of the major refactoring. It’s great to have such a good team of people working on this together!

A sneak peak at Thunderforest Lightning maps

Here’s a few screenshots from my new Lightning tile-server:

transport

A refresh of the Transport Style, built on a brand new, state-of-the-art vector tiles backend.

transport-dark

Thunderforest Lightning makes it easy to create custom map styles sharing vector tile backends. Here’s a Dark variant for the Transport layer.

transport-dark-large

Vector tiles bring lots of other benefits – like high-performance retina maps!

Development is going full-steam, and Lightning will be launching soon. Check back for more updates!

Tending the OpenStreetMap Garden

Gardening Blog Photo1

Yesterday I was investigating the OpenStreetMap elevation tag, when I was surprised to find that the third most common value is ’0.00000000000′! Now I have my suspicions about any value below ten, but here are 13,832 features in OpenStreetMap that have their elevation mapped to within 10 picometres – roughly one sixth of the diameter of a helium atom – of sea level. Seems unlikely, to be blunt.

It could of course be hundreds of hyper-accurate volunteer mappers, but I immediately suspect an import. Given the spurious accuracy and the tendancy to cluster around sea level, I also suspect it’s a broken import where these picometre-accurate readings are more likely to mean “we don’t know” than “exactly at sea level”. Curious, I spent a few minutes using the overpass API and found an example – Almonesson Lake. The NHD prefix on the tags suggests it came from the National Hydrography Dataset and so, as supected, not real people with ultra-accurate micrometres.

But what concerns me most is when I had a quick look at the data layer for that lake – it turns out that there are three separate and overlapping lakes! We have the NHD import from June 2011. We have an “ArcGIS Exporter lake” from October that year, both of which simply ignore the original lake created way back in Feb 2009, almost 5 years ago. There’s no point in having 3 slightly different lakes, and if anyone were to try to fix a misspelling, add an attribute or tweak the outline they would have an unexpectedly difficult task. There is, sadly, a continual stream of imports that are often poorly executed and, worse, rarely revisted and fixed, and this is just one case among many.

Mistakes are made, of course, but it’s clear that data problems like these aren’t being noticed and/or fixed in a reasonable timescale – even just this one small example throws up a score of problems that all need to be addressed. Most of my own editing is now focussed on tending and fixing the data that we already have, rather than adding new information from surveys. And as the size of the OpenStreetMap dataset increases, along with the seemingly perpetual and often troublesome importing of huge numbers of features, OpenStreetMap will need to adjust to the increasing priority for such data-gardening.

Using Vagrant to test Chef cookbooks

I’ve previously discussed using Chef as a great way to manage servers, the key part of the process being writing “cookbooks” to describe what software you want installed, how it should be configured, and what services should be running. But a question that I’ve been asked by a few people is how do I test my cookbooks as I are writing them?

Of course, a simple way to test them is run them on my laptop – which would be great, except that I would end up with all kinds of things installed that I don’t need, and there’s things that I don’t want to repeatedly uninstall just to check my cookbooks install it properly. The second approach is to keep run and re-run them on a server as I go along, but that involves uploading a half-written cookbook to my chef server, running chef-client on the server, seeing if it worked, rinsing and repeating. And I’d have to be brave or foolhardy to do this when the server is in production!

Step forward Vagrant. Vagrant bills itself as a way to:

“Create and configure lightweight, reproducible, and portable development environments.”

but ignore the slogan, that’s not what I use it for. Instead, I treat Vagrant as:

“A command-line interface to test chef cookbooks using virtual machines”

After a few weeks of using chef, I’d started testing cookbooks using VirtualBox to avoid trashing my laptop. But clicking around in the GUI, installing VMs by hand, and running Chef was getting a bit tedious, never mind soaking up disk-space with lots of virtual machine images that I was loathe to delete. With Vagrant, however, things become much more straightforward.

Vagrant creates virtual machines using a simple config file, and lets you specify a local path to your cookbooks, and which recipes you want to run. An example config file looks like:

Vagrant.configure("2") do |config|
  config.vm.box = "precise64"
  config.vm.provision :chef_solo do |chef|
    chef.cookbooks_path = "/home/andy/src/toolkit-chef/cookbooks"
    chef.add_recipe("toolkit")
  end
  config.vm.network :forwarded_port, guest: 80, host: 11180
end

You then run `vagrant up` and it will create the virtual machine from the “precise64″ base box, set up networking, shared folders and any other customisations, and run your cookbooks. If, inevitably, your in-development cookbook has a mistake, you can fix it and run `vagrant provision` to re-run Chef. No need to upload cookbooks anywhere or copy them around, and it keeps your development safely separated from your production machines. Other useful commands are `vagrant ssh` to log into the virtual machine (if you need to poke around to figure out if the recipes are doing what you want), `vagrant halt` to shut down the VM when you’re done for the day, and finally `vagrant destroy` to remove the virtual machine entirely. I do this fairly regularly – I’ve got half a dozen Vagrant instances configured for different projects and so often need to free up the disk space – but given I keep the config files then recreating the virtual machine a few months later is no more complex than `vagrant up` and a few minutes wait.

Going back to the original purpose of Vagrant, it’s based around redistributing “boxes” i.e. virtual machines configured in a particular way. I’ve never needed to redistibute a box, but once or twice found myself needing a particular base box that’s not available on vagrantbox.es – for example, testing cookbooks on old versions of Ubuntu. Given my dislike of creating virtual machines manually, I found the Veewee project useful. It takes a config file and installs the OS for you (effectively pressing the keyboard on your behalf during the install) and creates a reusable Vagrant base box. The final piece of the jigsaw is then writing the Veewee config files – step forward Bento, which is a pre-written collection of them. Using all these, you can start with a standard Ubuntu .iso file, convert that into a base box with Veewee, and use that base box in as many Vagrant projects as you like.

Finally, I’ve also used Vagrant purely as a command line for VirtualBox – if I’m messing around with a weekend project and don’t want to mess up my laptop installing random depenedencies, I instead create a minimal Vagrantfile using vagrant init, vagrant up, vagrant ssh, and mess around in the virtual machine – it’s much quicker than installing a fresh VM by hand, and useful even if you aren’t using Chef.

Do you have your own way of testing Chef cookbooks? Or any other tricks or useful projects? If so, let me know!

Getting Started With Chef

A little over a year ago I was plugging through setting up another OpenCycleMap server. I knew what needed installing, and I’d done it many times before, but I suspected that there was a better way than having a terminal open in one screen and my trusty installation notes in the other.

Previously I’d taken a copy of my notes, and tried reworking them into something resembling an automated installation script. I got it to the point where I could work through my notes line-by-line, pasting most of them into the terminal and checking the output, with the occasional note requiring actual typing (typically when I was editing configuration files). But to transform the notes into a robust hands-off script would have been a huge amount of work – probably involving far too many calls to sed and grep – and making everything work when it’s re-run or when I change the script a bit would be hard. I suspected that I would be re-inventing a wheel – but I didn’t know which wheel!

The first thing was to figure out some jargon – what’s the name of this particular wheel? Turns out that it’s known as “configuration management“. The main principle is to write code to describe the server setup, rather than running commands. That twigged with me straight away – every time I was adding more software to the OpenCycleMap servers I had this sinking feeling that I’d need to type the same stuff in over and over on different servers – I’d prefer to write some code once, and run that code over and over instead. The code also needs to be idempotent – i.e. it doesn’t matter how many times you run the code, the end result is the same. That’s about the sum of what configuration management entails.

There’s a few open-source options for configuration management, but one in particular caught my eye. Opscode’s Chef is ruby-based, which works for me since I do a fair amount of ruby development and it’s a language that I enjoy working with. And chef is also what the OpenStreetMap sysadmins use to configure their servers, so having people around who use the same system would simply be a bonus.

What started off as a few days effort turned into a massive multi-week project as I learned chef for the first time, and plugged through creating cookbooks for all the components of my server. It was a massive task and took much longer than I’d initially expected, but 18 months on it was clearly worth it – I’d have never been able to run enough servers for all the styles I have now, nor been able to keep up with the upgrades to the software and hardware without it. It’s awesome.

So here’s some tips, for those who have their own servers and are in a similar position to what I was.

  • How many servers before it’s worth it? Configuration management really kicks in to its own when you have dozens of servers, but how few are too few to be worth the hassle? It’s a tough one. Nowadays I’d say if you have only one server it’s still worth it – just – since one server really means three, right? The one you’re running, the VM on your laptop that you’re messing around with for the next big software upgrade, and the next one you haven’t installed yet. If you’re running a server with anything remotely important on it, then having some chef-scripts to get a replacement up and running if the first goes up in smoke is a really good time-critical aid when you need it most.
  • How do you get started with chef? Well, it’s tough, the learning curve is like a cliff. Chef setups have three main parts – the server(s) you’re setting up (the “node”), the machine you’re pressing keys on (the “workstation”) and the confusingly-named “chef server” which is where “nodes” grab their scripts (“cookbooks”) from. It makes sense to cut down the learning, so I’d recommend using the free 5-node trial of their Hosted Chef offering. That way you only need to concentrate on the nodes and workstation setup at first – and when you run out of nodes, there’s always the open-source chef-server if the platform is too expensive.
  • Which recipes should I use? There are loads available on github, and there’s links all over the chef website. In general, I recommend avoiding them, at least at first. Like I mentioned, the learning curve is cliff-like and while you can do super-complex whizz-bang stuff with chef, the public recipes are almost all vastly overcomplicated, and more importantly, hard to learn from. Start out writing your own – mine were little more than a list of packages to install at first. Then I started adding in some templates, a few scripts resources here and there, and built up from there as I learned new features. Make sure your chef repository is in git, and that you’re committing your cookbook changes as you go along
  • Where’s the documentation? I’d recommend following the tutorial to get things all set up, while trying not to worry too much about the details. Then start writing recipes. For that, the resources page on the wiki tells you everything you need to know – start with the package resource, then the template resource, then on to the rest. There’s a whole bunch of stuff that you won’t need for a long time – attributes, tags, searches – so don’t try learning everything in one go.

I’ll be writing more about developing and testing cookbooks in the future – it’s a whole subject in itself!

Make the hard things simple, and the simple things occasionally surprisingly hard

I’ve run two OpenStreetMap-themed training courses recently – one for university students, and one for a Local Authority. It’s great helping even more people get started with OpenStreetMap, and as is becoming a bit of theme, I took the opportunity to observe more people getting started with OSM.

Unlike previous outings to UCL, these two sessions had “getting started” notes that I had written – not a click-by-click tutorial, but notes of what things to try in a particular order. This lead to a little embarrassment when some of the seemingly innocuous instructions turned out to be surprisingly hard!

  • “Use the layer switcher to change the map layers” – the layer switcher is hard to find, even when you’re deliberately looking for it. We’re using the default “+” icon on osm.org – the stacked layers icon that I use on opencyclemap.org might be better. But I think best of all would be some rectangular buttons that are always visible. OpenLayers unfortunately makes this surprisingly hard.
  • “Set your home location” – this could do with some love. People want to type in their postcode, or at least search, in order to move the map around. I also found that about half a dozen people set their home location, and pressed edit, which opens Potlatch (despite the tab being “greyed out”) at somewhere unexpected.
  • “Add the person next to you as a friend” – this was a real head-slap moment when I thought about it. Given two people, sitting side by side, how do they add one another as a friend? If they are lucky, they’ve both set their home location within a hundred metres or so and show up on the list of nearby users. If not, the most straightforward way is to go to their own home page, edit the url, replace their username with the other person’s (case-sensitive) username, and then an “add as friend” link appears among all the other links. There’s so much wrong with this it’s embarrassing – or rather, embarrassing that I put the instructions in without thinking things through! A user search, and a button (rather than a link) to add as a friend, would help for a start.

The other things are things I noticed people trying to do, which are perfectly reasonable.

  • Go to http://www.osm.org. Click help. Admire. Now try to get back to the map, without pressing “Back” or retyping the url.
  • Go to help.openstreetmap.org and click on a username. Now try adding them as a friend.
  • Go to http://www.osm.org. Switch to another map layer. Click the map key. Get a blank tab.

Some maps don’t have a key (I’m guilty of that), but showing an empty panel isn’t helpful. We also found the wrong key appearing beside the different layers, but I can’t reproduce that today. As for the integration with the help centre – I know fine and well how tough it is to integrate separate software products, but users really neither know nor care about it.

And finally some run-of-the-mill observations, mainly of Potlatch 2

  • The p2 save dialog has too much text above the changeset comment field. People get bored reading it, I think because they aren’t expecting an interruption when they press save.
  • It’s still unclear how to start drawing lines and areas. In fact, most people accidentally start drawing lines, and press escape, without realising later on when they want to draw one that they already know how.
  • People want to add icons to points of interest that are already drawn, but as an area. Maybe we should symbolize areas, or even better, prevent icons from being dropped onto existing areas with the same tags.
  • People get mightily confused when the icons on the map don’t match the icons on the sidebar. Maybe we need to rethink how the sidebar icons appear.
  • Creating other points of interest is hard to figure out (i.e. double-click).
  • If you have a large named area, it can be non-obvious (especially when zoomed in) what’s causing the name to appear.
  • There’s useful shortcuts (like J) that don’t appear in the help.
  • There’s lots of useful actions that don’t have any GUI for them, unless you count documenting the keypresses on the 8th tab of the Help menu!
  • You can get to the situation where something hasn’t loaded – either the map, or in some cases map_features, and find yourself in a world of hurt, with no warning.
  • One person couldn’t figure out panning the map around while editing. That’s a combination of no buttons, and that if you (tentatively) click on the background, something happens (start drawing a way), so you learn not to click on the background. Of course, to pan the map you need to mousedown to drag it.
  • I’ve never seen anyone using the Potlatch 2 search button, but people often use the main search bar while editing. That often leads to pain when they click on the results.

One of the things that I want to work on within Potlatch 2 is to (mis)use the sidebar to provide context sensitive help. So I imagine when you’re drawing a way, a little square at the bottom of the sidebar says “You’re drawing a line. Double click to stop drawing, click on another way to create a junction” and so on. I think it’ll be especially useful for the first 10 minutes while people get to grips with things.

But, in saying all this, the feedback I get time and time again is how easy it is to get started with OSM, very rarely do I hear that participants found it hard. We can, however, make it even easier!

TileMill, Carto and the Transport Map

A few months ago I started exploring some new technologies from DevelopmentSeed – namely Carto and TileMill. Carto is a CSS-style map description language, similar to Cascadenik, and TileMill is a browser-based application that lets you view maps while you’re designing them.

Initially my efforts were a complete flop – at that point neither Carto nor TileMill had any support for storing the map data in PostGIS, which is a key component of making maps from OSM data. A month later and support was added, so I got cracking – mainly bashing my head against the weirdness of the node package management system NPM. But after a lot of effort and only a little swearing, I got it all working. It’s totally worth it.

Making Maps with Tilemill

Designing maps is hard – both in the amount of complexity in the style rules (there’s lots of colours, widths and so on) and also in the data – every town is different, and the way a map looks varies wildly between town and countryside. So a key thing is to be able to iterate quickly, to make changes and to see how things look in different areas. My previous toolchain – effectively just a modified version of generate_image.py – was a complete pain. To check different areas I’d need to find out the coordinates, pick a zoom level, wait for the image, check it, rinse, lather and repeat. The power of having live-updating maps in TileMill is not to be underestimated!

My first map style produced with Carto and TileMill was the Transport layer. I had originally created the Transport layer using Cascadenik – similar to Carto, it’s a CSS-inspired approach to describing map styling, and much easier than writing individual symbolizers in XML. Carto takes the idea another step forward with sophisticated rules nesting, which I’ve been using more and more in recent months. Since porting the Transport layer, I’ve ported all my other styles to Carto, but more on that some other time. If you’re still creating mapnik style rules by editing XML, I’d advise you check out Carto instead!

OpenStreetMap usability revisited

At the end of September I took a half-day off from the day job and visited UCL. They were again running an Introduction to OpenStreetMap Mapping Workshop for their new masters students. I went along last year and created some great notes on usability for OSM newbies and did the same again this year. It’s rare for me to be able to watch (and help) so many newbies at the same time.

The main difference between last year and this have been the move to Potlatch 2 as the main editor, so I was especially looking forward to seeing how this performed. Also the students were this year focussing on wheelchair accessibility mapping, which had implications mainly for the detail of our presets compared to this highly-detailed (and relatively unusual) mapping focus.

So here’s the list of notes that I made, in the order that I made them

  1. We need a deselect button. When you have a feature selected it’s not obvious that to deselect you just click somewhere else on the map
  2. The wiki page on wheelchair mapping is unclear about tagging accessibility of toilets when they are in another amenity (e.g. pub) rather standalone toilets (amenity=toilets)
  3. One person triggered EntityUI exceptions when zooming in and out. I was surprised to see the exception showing – normally these only show on debug flash plugins
  4. Still confusion on how to add features that aren’t in the grid of icons (Current solution is to double-click to create a POI, suggestion is to have an “other” poi to drag/drop)
  5. The conflict dialog, which you see when two people edit the same road, isn’t particularly helpful. It only gives the id of the feature, which doesn’t help. There’s no method to reconcile the differences (or even see what they are). Yes/No labels on the buttons are bad.
  6. The backgrounds dialog needs better labels. e.g. “Bing Aerial Imagery” since “Bing” is meaningless
  7. Need to drag/drop “new point” (as above – shows you how often it came up!)
  8. Maybe need a “More…” button on the presets to provide some way to reassure people they aren’t definitive and show them how to figure things out
  9. Click-again (that is, clicking twice slowly) should also create a POI
  10. It’s hard to read the road names, especially when they are at an angle
  11. Duplicate nodes, when shown, aren’t easily figured out what they mean
  12. One person made the advanced tag panel go haywire by having multiple new tag entries – and managed it repeatedly
  13. Wiki documentation on bookmakers still sucks. We ran into this last year – there’s a lot of bookmakers in London, and especially if you know a different term for it (gambling shop etc) the documentation is hard to find
  14. Would be great to highlight mistakes, e.g. tagging building=yes on a node. This happened a couple of times when people had a node of an area selected when they started adding tags
  15. Copying tags from nodes to ways (see above)
  16. Newbies shouldn’t be exposed to the footway vs path controversy on the wiki.
  17. Nobody ever finds the search box on the wiki, especially when they are using browser-based find on the Map Features page.
  18. People accidentally mousewheel out too far repeatedly when editing. Maybe we should prevent it at low zooms
  19. barrier = entrance vs building = entrance is unclear
  20. Nobody reads past the first paragraph of the Key pages on the wiki before just skim-reading the read. Which means sentences like “Some people use the tag ‘foo = bar’ when they should instead use ‘baz = bar’ becomes “….. ‘foo = bar’ ….” and that gets used.
  21. The public transport pages on the wiki are dreadful, and newbies shouldn’t be exposed to two alternative tagging schemes. I have my own views on the whole new pointlessly-incompatible schema in any case.
  22. You can end up with both the rails_port search panel and potlatch 2 open at the same time. If you try closing the search panel you get the “leaving the page” warning, when you aren’t actually leaving the page.
  23. The “loading….” label isn’t obvious
  24. Areas of the map that haven’t yet had the data downloaded could be highlighted (or disabled) so that you don’t think it’s just empty.
  25. We need some way of saying “Zoom in!” when you have too much data showing at the given time and flash is crawling to a halt
  26. The data loading could be improved by having a tile-based map call instead of the current wms-like map call.

Some of these things are familiar from previous user testing, some are new, and some will need a bit of discussion to tackle. This is a good opportunity to plug the upcoming Hack Weekend!

Thanks to Dr Patrick Weber for inviting me along.

Dealing with GDAL and Mapnik

Getting GDAL and Mapnik to play nice is a complete pain. Now that I’ve managed it, I’ll give you the solution and explain some of the background.

Mapnik has two plugins for reading image files and using them as background in maps. For OpenCycleMap I currently use the “raster” plugin which reads the files directly, and I need to calculate and supply mapnik with all the coordinates for each image. It’s a bit tedious, but when we set up OpenCycleMap a few years ago it was the only way we could get things to work.

Time moves on, and for new projects (and the massive forthcoming OpenCycleMap upgrade) I’m using the “gdal” plugin. This uses the wonderful (but sometimes infuriating) GDAL libraries to read the images and use any geo-information that’s embedded within them. Saves a lot of hassle, and when you’re dealing with tens of thousands of raster images then things like .vrt files are a godsend.

However, gdal has a secret lurking deep within its sourcecode, and it’s all to do with libtiff. libtiff is the library for reading .tif files, which are normally limited to 4Gb in size. There’s a new version of libtiff that deals with giant tiff files that are greater than 4Gb (known as BigTIFF). The version of libtiff that comes with Ubuntu doesn’t have BigTIFF support, so the GDAL packages use their own internal copy of the library. With version 0.8.0 of gdal, a feature was added to throw an error if multiple versions of gdal were found active at the same time (in 0.8.1 this was downgraded to a warning). But for most applications using gdal there’s no big problem – they use the gdal libraries, and hence use the BigTIFF version of libtiff. Meanwhile the standard libtiff (which loads of other things need – trust me, don’t try uninstalling it) is left out of the picture and usused.

The problem is if your application – say, mapnik – compiles against both the system libtiff and gdal-with-BigTiff. If you’re using gdal before 0.8.0 then you might get silent corruption of the output, if you’re using 0.8.0 the mapnik will keep crashing with “ERROR 1: WARNING ! libtiff version mismatch : You’re linking against libtiff 3.X but GDAL has been compiled against libtiff >= 4.0.0″

The trick to this is to avoid using any ubuntu packages of gdal – whether from the ubuntugis PPA repositories or anywhere else – until someone somewhere sorts it all out (probably in some future Ubuntu libtiff will have BigTiff support built-in). In the meantime, grab yourself gdal from source (0.8.0 is fine, btw) and configure it with

./configure –with-libtiff=/usr/lib

This forces gdal to use the system libtiff, and prevents any corruptions or segfaults in applications (like mapnik, which you’ll need to recompile too). It means you don’t get BigTiff support, but hey-ho. But most importantly, you can stop spending all your life juggling gdal versions trying to find which particular combination of packages and PPAs work (hint: none of them do). Thanks to Dane for the final clue – I’ve spent days of my life repeatedly battling this!