Tag Archives: OpenStreetMap

Open Source Mapping

OpenStreetMap Carto Workshop

At State of the Map Europe I ran a workshop about openstreetmap-carto, the stylesheets that power the Standard map layer on OpenStreetMap.org and many hundreds of other websites. The organisers have published the video of the workshop:

Thanks to the organising team for inviting me to run this workshop – it was certainly well received by the audience, and I spent the rest of the day disussing the project with other developers.

I continue to be surprised by two aspects of openstreetmap-carto. One is how much work is involved in making significant changes to the cartography – after a year and a half, to the casual observer not much has happened! But on the other hand, we now have a large group of people who are commenting on the issues and making pull requests. As of today there are 29 people whose contributions show up in the style, with over 700 issues opened and over 220 pull requests made. Much of the work is now done by Paul Norman and Matthijs Melissen who help review pull requests for me and do almost all of the major refactoring. It’s great to have such a good team of people working on this together!

A sneak peak at Thunderforest Lightning maps

Here’s a few screenshots from my new Lightning tile-server:

transport

A refresh of the Transport Style, built on a brand new, state-of-the-art vector tiles backend.

transport-dark

Thunderforest Lightning makes it easy to create custom map styles sharing vector tile backends. Here’s a Dark variant for the Transport layer.

transport-dark-large

Vector tiles bring lots of other benefits – like high-performance retina maps!

Development is going full-steam, and Lightning will be launching soon. Check back for more updates!

Tending the OpenStreetMap Garden

Gardening Blog Photo1

Yesterday I was investigating the OpenStreetMap elevation tag, when I was surprised to find that the third most common value is ’0.00000000000′! Now I have my suspicions about any value below ten, but here are 13,832 features in OpenStreetMap that have their elevation mapped to within 10 picometres – roughly one sixth of the diameter of a helium atom – of sea level. Seems unlikely, to be blunt.

It could of course be hundreds of hyper-accurate volunteer mappers, but I immediately suspect an import. Given the spurious accuracy and the tendancy to cluster around sea level, I also suspect it’s a broken import where these picometre-accurate readings are more likely to mean “we don’t know” than “exactly at sea level”. Curious, I spent a few minutes using the overpass API and found an example – Almonesson Lake. The NHD prefix on the tags suggests it came from the National Hydrography Dataset and so, as supected, not real people with ultra-accurate micrometres.

But what concerns me most is when I had a quick look at the data layer for that lake – it turns out that there are three separate and overlapping lakes! We have the NHD import from June 2011. We have an “ArcGIS Exporter lake” from October that year, both of which simply ignore the original lake created way back in Feb 2009, almost 5 years ago. There’s no point in having 3 slightly different lakes, and if anyone were to try to fix a misspelling, add an attribute or tweak the outline they would have an unexpectedly difficult task. There is, sadly, a continual stream of imports that are often poorly executed and, worse, rarely revisted and fixed, and this is just one case among many.

Mistakes are made, of course, but it’s clear that data problems like these aren’t being noticed and/or fixed in a reasonable timescale – even just this one small example throws up a score of problems that all need to be addressed. Most of my own editing is now focussed on tending and fixing the data that we already have, rather than adding new information from surveys. And as the size of the OpenStreetMap dataset increases, along with the seemingly perpetual and often troublesome importing of huge numbers of features, OpenStreetMap will need to adjust to the increasing priority for such data-gardening.

Make the hard things simple, and the simple things occasionally surprisingly hard

I’ve run two OpenStreetMap-themed training courses recently – one for university students, and one for a Local Authority. It’s great helping even more people get started with OpenStreetMap, and as is becoming a bit of theme, I took the opportunity to observe more people getting started with OSM.

Unlike previous outings to UCL, these two sessions had “getting started” notes that I had written – not a click-by-click tutorial, but notes of what things to try in a particular order. This lead to a little embarrassment when some of the seemingly innocuous instructions turned out to be surprisingly hard!

  • “Use the layer switcher to change the map layers” – the layer switcher is hard to find, even when you’re deliberately looking for it. We’re using the default “+” icon on osm.org – the stacked layers icon that I use on opencyclemap.org might be better. But I think best of all would be some rectangular buttons that are always visible. OpenLayers unfortunately makes this surprisingly hard.
  • “Set your home location” – this could do with some love. People want to type in their postcode, or at least search, in order to move the map around. I also found that about half a dozen people set their home location, and pressed edit, which opens Potlatch (despite the tab being “greyed out”) at somewhere unexpected.
  • “Add the person next to you as a friend” – this was a real head-slap moment when I thought about it. Given two people, sitting side by side, how do they add one another as a friend? If they are lucky, they’ve both set their home location within a hundred metres or so and show up on the list of nearby users. If not, the most straightforward way is to go to their own home page, edit the url, replace their username with the other person’s (case-sensitive) username, and then an “add as friend” link appears among all the other links. There’s so much wrong with this it’s embarrassing – or rather, embarrassing that I put the instructions in without thinking things through! A user search, and a button (rather than a link) to add as a friend, would help for a start.

The other things are things I noticed people trying to do, which are perfectly reasonable.

  • Go to http://www.osm.org. Click help. Admire. Now try to get back to the map, without pressing “Back” or retyping the url.
  • Go to help.openstreetmap.org and click on a username. Now try adding them as a friend.
  • Go to http://www.osm.org. Switch to another map layer. Click the map key. Get a blank tab.

Some maps don’t have a key (I’m guilty of that), but showing an empty panel isn’t helpful. We also found the wrong key appearing beside the different layers, but I can’t reproduce that today. As for the integration with the help centre – I know fine and well how tough it is to integrate separate software products, but users really neither know nor care about it.

And finally some run-of-the-mill observations, mainly of Potlatch 2

  • The p2 save dialog has too much text above the changeset comment field. People get bored reading it, I think because they aren’t expecting an interruption when they press save.
  • It’s still unclear how to start drawing lines and areas. In fact, most people accidentally start drawing lines, and press escape, without realising later on when they want to draw one that they already know how.
  • People want to add icons to points of interest that are already drawn, but as an area. Maybe we should symbolize areas, or even better, prevent icons from being dropped onto existing areas with the same tags.
  • People get mightily confused when the icons on the map don’t match the icons on the sidebar. Maybe we need to rethink how the sidebar icons appear.
  • Creating other points of interest is hard to figure out (i.e. double-click).
  • If you have a large named area, it can be non-obvious (especially when zoomed in) what’s causing the name to appear.
  • There’s useful shortcuts (like J) that don’t appear in the help.
  • There’s lots of useful actions that don’t have any GUI for them, unless you count documenting the keypresses on the 8th tab of the Help menu!
  • You can get to the situation where something hasn’t loaded – either the map, or in some cases map_features, and find yourself in a world of hurt, with no warning.
  • One person couldn’t figure out panning the map around while editing. That’s a combination of no buttons, and that if you (tentatively) click on the background, something happens (start drawing a way), so you learn not to click on the background. Of course, to pan the map you need to mousedown to drag it.
  • I’ve never seen anyone using the Potlatch 2 search button, but people often use the main search bar while editing. That often leads to pain when they click on the results.

One of the things that I want to work on within Potlatch 2 is to (mis)use the sidebar to provide context sensitive help. So I imagine when you’re drawing a way, a little square at the bottom of the sidebar says “You’re drawing a line. Double click to stop drawing, click on another way to create a junction” and so on. I think it’ll be especially useful for the first 10 minutes while people get to grips with things.

But, in saying all this, the feedback I get time and time again is how easy it is to get started with OSM, very rarely do I hear that participants found it hard. We can, however, make it even easier!

OpenStreetMap Hack Weekend

Last weekend we held another Hack Weekend for OpenStreetMap, and I thoroughly enjoyed it from start to finish. Especially the start, which involved sitting outside on a warm spring evening with a cold beer and unwinding!

This was probably one of the largest Hack Weekends that we’ve ran so far – I counted 25 people at one point – and I volunteered to help anyone who was interested in using git, developing Potlatch2 and improving the Rails Port (aka the OpenStreetMap website). As part of this I ran a few short workshops which were surprisingly well attended – I’d expected 2 or 3 people for each one but ended up with 10-15 each instead! I’ll be interested to see what workshops people are interested in for the next Hack Weekend.

When I wasn’t running workshops or helping other people, I was working with Richard Fairhurst on the Potlatch 2.0 release – and this was the point where we made it the default editor on the OpenStreetMap website. It’s been painful for the last few months watching thousands of people learning to use potlatch1, so we’ve just made a big step in making OpenStreetMap easier to get started with. The news made it onto OpenGeoData and even ReadWriteWeb. Development doesn’t stop at 2.0, of course – we’ve got lots of in-progress work on branches (including the long-awaited History dialog that I’ve been working on) and it’ll be good to see them being merged in when they are ready. We also managed to spot a few bugs within the first few hours of the new release!

It was also great to see a bunch of people committing code to projects they’d never worked on before – one of the main reasons we run the weekends. There was lots of work on the Rails Port, including improving the layout on mobile screens and working round bugs with postgres 9. But I’ve no idea what everyone was up to at the far end of the room – it was such a big, busy weekend that I couldn’t keep track! One thing that was prevalent were people picking up git for the first time, and our recent migration to using git for Potlatch2 proved really useful when juggling which features to include in 2.0 and which to leave for further development.

I’m looking forward to the next Hack weekend, which Matt is already organising. If you’re tempted to come help develop OSM and learn something new, you should come along!

Tweak a little here, fix a little there

Another round of updates to the OpenCycleMap cartography was released a week ago, after a few days of local testing, bug investigating and general “technical-debt” payments.

The biggest fix is that I’ve finally tracked down what was causing all kinds of problems with riverbanks. The OpenCycleMap code dates back from long in the past when the riverbank tag was first introduced, and since then it’s greatly expanded and is now heavily used in multipolygons. There was a bug with some code thinking they were linear features and other code treating them as polygons – which used to work fine, but was recently leading to giant triangles lurching across the landscape. Thankfully it turned out not to be a problem with the relation-handling code in osm2pgsql – I had enough of that last year!

Riverbanks gone mad

A major feature of this update is the map now treats points of interest – like shops, pubs and so on – equally, whether they are tagged as nodes or as areas. So in hyper-detailed places where shop nodes are being replaced by building outlines the names and icons will now show up properly. You can see some examples around Peckham where Tom Chance has been hard at work.

Another ‘technical debt’ problem was regarding the “cycle node networks” widely used in the Netherlands and Belgium. When I originally tried rendering the icons at the junctions mapnik blew up - there was a bug with running ShieldSymbolizer on points. Even though this was fixed in mapnik years ago it was only last summer that I started using a new enough version, and it’s taken until now for me to reinstate (and redesign) the icons. But the new circles certainly look nicer than just numbers on the map, so it’s been worth the wait!

Node network

Pedestrian areas are finally drawn properly, and cafés have been added. Bike shops get a new, clearer icon and suburbs and localities are shown. On the attention to detail front, at medium zoom levels national cycle routes are consistently prioritised over regional and local routes, and place names should behave a bit more predictably as you zoom in. And finally street labels won’t bend back on themselves so much and should therefore be easier to read.

Tangled Mess

The server is chugging away at refreshing all the tiles – it’ll take a week or so to get through them all, but you can see the updates filtering through already in the most popular areas.

Many thanks go to MotionX for supporting the project and this round of updates in particular, and to everyone who diligently filed bug reports and (gently) encouraged me to fix them!

Tiger Edited Map resurrected

Recently I’ve been working with MapQuest to rebuild the OpenStreetMap “Tiger Edited Map“. It was publicly released last week (blog, link).

Tiger Edited Map

The original map was created by Matt while he was at CloudMade, but it disappeared not long after we left at the end of last year. This is a from-scratch reimplementation with a few bonus features – it’s updated every few minutes, and the stylesheets are available on GitHub. It uses osm2pgsql with extended attribute information to enable styling by openstreetmap id and date ranges (see the nitty-gritty here) – and a word to the wise: don’t turn on extended attributes for nodes unless you have infinite hard drive space and patience to go with it!

It’s great to see how much progress there’s been this year, and it shows where we need to check for the usual TIGER issues. One of the interesting things for me is that it shows a recognisable editing pattern across the entire US – the major roads have all been edited (most multiple times), as have vast swathes of urban areas – enough that OSM is a distinct enough dataset from TIGER to stand out on its own. Hopefully this will inspire more people to fix up the streets in their own areas and drive the quality of OSM data in the US upwards – step by step. My next plans along these lines is to work on the Rapid Assessment Tool I made some time ago – moving along the QA debate from the origins of the data (I believe we’re often too hung up on the word “TIGER”) and onto assessing how good OSM data is on its own merits.

If anyone has any suggestions for improvements to the style – especially changes to the detection algorithm, or similar ideas for other regions – then I’d love to see either forks from the git sources or even plain old comments below!

Quick and dirty usability testing of OSM

Last week I joined Ant and Deb from MapQuest in order to help out with the UCL mapping party. On the Wednesday I went out with some new Masters students and got soaked in the rain around Camden, but the main interest for me was the following day when we all gathered in the computer lab to uploaded the newly collected data. While I was helping out I was also scribbling furiously whenever I found someone stuck on some aspect of OSM that I hadn’t expected.

UCL student mapping party

I was briefly worried that there would be a flurry of activity while they logged on and that I’d miss most of it, but actually the account creation was so long and tortuous that it gave me plenty of time to watch. Silver linings, etc, I guess. I took notes, and so here they are, in the order I wrote them down.

  1. Where did the email go? – The biggest hurdle and the one that spread them out was confirming their email. Given that the OSM servers are on the same campus as we were, it took an extraordinary amount of time for them to appear. But the issue here was that on the user signup page there was no indication as to which email address the confirmation email was sent to, and one person was worried there was a typo. It also made it impossible for me to check that there wasn’t a typo in their (to them) brand new address.
  2. Nobody reads the CTs, and everyone ticks the PD box without reading it either – I’ll win no friends with this observation, but I saw nobody scrolling the CTs box, and everyone reflexively ticked the box beside the agree button. I’m guessing they all thought it was a “have you read the above legal stuff” which you normally get on such forms.
  3. Send another confirmation email – There’s no way to trigger sending another copy of the confirmation email. Sometimes they go missing, and at least if there was a button the frustration levels would go down.
  4. Not obvious what the settings page is for – After confirming their email the users end up on the settings page, where almost the first thing it shows you is your email address and a box to put a new email address into. That confused a lot of people. Things like add a friend, set a home location, read some getting started notes etc would be more useful
  5. Highlight unrecognised tags – I found one guy who had, and it’s not clear how, ended up with all his name tags with a capital N. These would be better highlighted while editing that it’s an unexpected tag.
  6. Anxiety over tags missing from autocomplete lists – on two occasions I had people worried that what they were typing (in both cases “office”) wasn’t in the autocomplete list. I had to explain that there are things on Map Features (and elsewhere on the wiki) that aren’t in the list, and that’s not a problem.
  7. Confusion over the preset dropdown (10a and 10b on this image) Three people struggled to make it stay open (i.e. click – hold – move – release). One guy kept selecting different things, and didn’t realise it was adding more tags and changing one (amenity) that he’d already set, until I pointed it out. I had to explain the small icon (10a) was a button that changed what was on the dropdown. Most of the icons used in 10a weren’t understood (car and bike were good, the football and postbox less so). Many people made the same mistake of adding a POI, adding the correct tags, and then worrying that it said “(no preset)” and tried to find the correct thing in the menu – i.e. misunderstood the purpose of it.
  8. Couldn’t find double-click – Since they were entering POIs they’d already collected, they rapidly found themselves without an appropriate one on the POI panel and searched the wiki. With the tags in hand, they were then stumped on how to add a blank POI. One guy worked out he could change the tags on an existing one, but either instructions (“double click”) or a multi-purpose / “blank” POI icon would be better.
  9. Couldn’t add extra tags – three or four times people needed the + icon pointed out to them
  10. Map Features – long descriptions – most people found themselves on Map Features reading the key, value and short description, but I didn’t see anyone realise that they could click the value for more details. This should perhaps go (automatically) onto the end of the short description text as a “More details…” link.
  11. Confusion with abandoned features – repeatedly people found proposed and/or abandoned features, and similar wiki-works-in-progress. As well as not understanding, they also didn’t care, and didn’t read the page either – they were just skim-reading to find the tags they needed. I’d lean strongly towards clearing off the 3-year-old abandoned pages, but I realise there are “wiki-historians” who want to keep everything for posterity.
  12. Search beyond Map Features – most people searched up and down the Map Features page using the browser-based search (Ctrl+F). They were then stuck when they couldn’t find the thing they were looking for, and had to be pointed towards the search box to search the rest of the wiki. Again, it wasn’t clear that there are plenty of things obscure enough to not be on the main list. Also, “Also known as” and “similar to” and “see also” sections of the tag documentation are worth their weight in gold. A surprising number of pages don’t have them.

A lot of the most interesting stuff I found was regarding Potlatch 1, and (fortunately?) very little of it applies to Potlatch2 since the UI has been overhauled. I’d love to also work on the Friends functionality of the website, since when the students started “friending” each other, pretty much nothing happened. We could show friends edits, diary entries etc. One thing that stood out for me though, was we should remove the PD tickbox from the CTs. It’s added confusion if you read it, and most people don’t so the point of it is moot. It’s not on the critical path for signup so it shouldn’t be in the signup flow at all. It can live in the user settings page or somewhere similar. It’s not legally binding and it’s not working a straw poll either. Finally, it would be great if there was more stuff possible before the email was confirmed, like adding friends – or even links to introduction videos or something like that.

I’ll leave you with the best and least-expected I-never-thought-of-that example of the day. I watched one student find the entry in Map Features for the shop that he wanted to add. He highlighted the icon, right clicked and selected Copy, then changed tabs to Potlatch and right clicked in order to paste the icon where he wanted it to appear.

If only, my friend, if only.

Thanks to Muki Haklay and Thomas Koukoletsos from UCL for inviting us along. If anyone has any similar opportunities for me to come and watch people learning OSM, please get in touch.

Map rendering on EC2

Over the last two years I’ve been running the OpenCycleMap tileserver on Amazon’s EC2 service. Plenty of other people do the same, and I get asked about it a lot when I’m doing consulting for other companies. I thought it would be good to take some time to say a bit about my experiences, and maybe this will be useful to you at some point.

OpenCycleMap tileEC2 is great if you have a need for lots and lots of computing power, and your need for using CPUs fluctuates. At its best, you have a task that needs hundreds of CPUs, but only for a few hours. So you can spin up as many instances as you like, do your task, and switch them back off again. Map rendering, and here I’m talking about mapnik/mod_tile rendering of OpenStreetMap data, initially seems to hit that use-case – generating map tiles involves lots of processing of the map data, and then you have your finished map images which are trivial to serve.

But that’s not really the case, it turns out. After you’ve finished experimenting with small areas and start moving to a global map, you find that disk IO is by far the most important thing. There are two stages to the data processing – import and rendering. During import you take a 10Gb openstreetmap planet file and feed it into PostGIS with osm2pgsql. You want to use osm2pgsql –slim (to allow diff updates), but that involves huge amounts of writing and reading from disk for the intermediate tables. It can take literally weeks to import. When you’re rendering, renderd lifts the data from the database, renders it, writing the tiles back to disk, and then mod_tile reads the disk store to send the tiles to the client. All in all, lots of disk activity. And hugely more if you mention contours or hillshading.

Which wouldn’t be too bad, except the disks on EC2 suck. It’s not a criticism, since it’s an Elastic Compute Cloud, not an Elastic Awesome-Disks Cloud. It’s a system designed for doing calculations, not handling reading and writing huge datasets to and from disk. So their virtual disks are much slower than you would like or expect from the rest of the specs. On the opencyclemap “large” EC2 instance, roughly one core is being used for processing, and the rest is all blocked on IO. Although it’s marked as having “high” IO performance on their instance types page, I’d suggest for “moderate” and “high” you should read “dreadful” and “merely poor” respectively.

Amazon’s S3 is their storage component of their Web Services suite. So instead of thrashing the disks on EC2, how about storing tiles on S3? It’s possible, but the main drawback is that it makes it much, much harder to generate tiles on-the-fly. If you point your web app at an S3 bucket there’s no way that I know of to pass 404s onto an EC2 instance to fulfil. If you’re happy with added latency, then you could still run a server that queries S3 before deciding to render, and copy the output to S3, but I can’t imagine that being faster than using EC2′s local storage. You can certainly use S3 to store limited tilesets, such as limited geographical areas or a limited number of zooms. But pre-generating a full planet’s worth of z18 tiles would take up terabytes of space, and only a vanishingly small number of tiles would ever be served.

Finally, there is the cost of running a tileserver. Although Amazon are quite cheap if you want a hundred servers for a few hours, the costs start mounting if you have only one server running 24 hours a day – which is what you need from a tileserver or any other kind of webserver. $0.34 per hour seems reasonable until you price for the first four weeks uptime, where all kinds of non-cloud providers come into play, simply paying monthly rent on a server instead. Factoring in bandwidth costs for a moderately well-used tileserver can make it mightily expensive. Any extras can be added too – EBS if you want your database to survive the instance being pulled, or S3 storage.

EC2 is, more or less, exactly not what you want from a tileserver. Expensive to run, slow disks. So why is it popular? First off is buzzwords – cloud, scalable and so on. If you aren’t careful you can easily empty the piggybank on running a handful of tileservers long before you’re running enough to do proper demand-based scaling changing from hour to hour during the day. If you’re trying to “enterprise” your system you’ll worry about failovers long before you need such elastic scaling, and you need your failovers and load balancers running 24×7 too. Second is for capacity planning – if you want to do no planning whatsoever, then EC2 is great! But it’s much cheaper to rent a few servers for the first couple of months, and add more to your pool when (if?) your tileserver gets popular. But a there is a third reason that is quite cool – for people like Development Seed’s TileMill – you can give your tileserver image to someone else extremely easily, and it’s their credit card that gets billed, and they can turn on and off as many servers as they like without hassling you.

CambridgeI’ve been setting up a new tileserver for OpenCycleMap that’s not on EC2, and I’ll post here again later with details of how I got on. I’m also working on another couple of map styles – with terrain data, of course, and if you’re interested in hearing more then get in touch.

So in summary

  • I’d recommend EC2 if you want to pre-generate large numbers of tiles (say a continent down to z16), copy them somewhere and then switch off the renderer
  • I’d consider EC2 for ultra-large setups where you are running 5 or more tileservers already, but only as additional-load machines
  • I wouldn’t recommend EC2 if you want to run an on-the-fly tileserver. Which is what most people want to do.

Any thoughts? Running a tileserver on EC2 and disagree? Let me know below.