Map rendering on EC2

Over the last two years I’ve been running the OpenCycleMap tileserver on Amazon’s EC2 service. Plenty of other people do the same, and I get asked about it a lot when I’m doing consulting for other companies. I thought it would be good to take some time to say a bit about my experiences, and maybe this will be useful to you at some point.

OpenCycleMap tileEC2 is great if you have a need for lots and lots of computing power, and your need for using CPUs fluctuates. At its best, you have a task that needs hundreds of CPUs, but only for a few hours. So you can spin up as many instances as you like, do your task, and switch them back off again. Map rendering, and here I’m talking about mapnik/mod_tile rendering of OpenStreetMap data, initially seems to hit that use-case – generating map tiles involves lots of processing of the map data, and then you have your finished map images which are trivial to serve.

But that’s not really the case, it turns out. After you’ve finished experimenting with small areas and start moving to a global map, you find that disk IO is by far the most important thing. There are two stages to the data processing – import and rendering. During import you take a 10Gb openstreetmap planet file and feed it into PostGIS with osm2pgsql. You want to use osm2pgsql –slim (to allow diff updates), but that involves huge amounts of writing and reading from disk for the intermediate tables. It can take literally weeks to import. When you’re rendering, renderd lifts the data from the database, renders it, writing the tiles back to disk, and then mod_tile reads the disk store to send the tiles to the client. All in all, lots of disk activity. And hugely more if you mention contours or hillshading.

Which wouldn’t be too bad, except the disks on EC2 suck. It’s not a criticism, since it’s an Elastic Compute Cloud, not an Elastic Awesome-Disks Cloud. It’s a system designed for doing calculations, not handling reading and writing huge datasets to and from disk. So their virtual disks are much slower than you would like or expect from the rest of the specs. On the opencyclemap “large” EC2 instance, roughly one core is being used for processing, and the rest is all blocked on IO. Although it’s marked as having “high” IO performance on their instance types page, I’d suggest for “moderate” and “high” you should read “dreadful” and “merely poor” respectively.

Amazon’s S3 is their storage component of their Web Services suite. So instead of thrashing the disks on EC2, how about storing tiles on S3? It’s possible, but the main drawback is that it makes it much, much harder to generate tiles on-the-fly. If you point your web app at an S3 bucket there’s no way that I know of to pass 404s onto an EC2 instance to fulfil. If you’re happy with added latency, then you could still run a server that queries S3 before deciding to render, and copy the output to S3, but I can’t imagine that being faster than using EC2′s local storage. You can certainly use S3 to store limited tilesets, such as limited geographical areas or a limited number of zooms. But pre-generating a full planet’s worth of z18 tiles would take up terabytes of space, and only a vanishingly small number of tiles would ever be served.

Finally, there is the cost of running a tileserver. Although Amazon are quite cheap if you want a hundred servers for a few hours, the costs start mounting if you have only one server running 24 hours a day – which is what you need from a tileserver or any other kind of webserver. $0.34 per hour seems reasonable until you price for the first four weeks uptime, where all kinds of non-cloud providers come into play, simply paying monthly rent on a server instead. Factoring in bandwidth costs for a moderately well-used tileserver can make it mightily expensive. Any extras can be added too – EBS if you want your database to survive the instance being pulled, or S3 storage.

EC2 is, more or less, exactly not what you want from a tileserver. Expensive to run, slow disks. So why is it popular? First off is buzzwords – cloud, scalable and so on. If you aren’t careful you can easily empty the piggybank on running a handful of tileservers long before you’re running enough to do proper demand-based scaling changing from hour to hour during the day. If you’re trying to “enterprise” your system you’ll worry about failovers long before you need such elastic scaling, and you need your failovers and load balancers running 24×7 too. Second is for capacity planning – if you want to do no planning whatsoever, then EC2 is great! But it’s much cheaper to rent a few servers for the first couple of months, and add more to your pool when (if?) your tileserver gets popular. But a there is a third reason that is quite cool – for people like Development Seed’s TileMill – you can give your tileserver image to someone else extremely easily, and it’s their credit card that gets billed, and they can turn on and off as many servers as they like without hassling you.

CambridgeI’ve been setting up a new tileserver for OpenCycleMap that’s not on EC2, and I’ll post here again later with details of how I got on. I’m also working on another couple of map styles – with terrain data, of course, and if you’re interested in hearing more then get in touch.

So in summary

  • I’d recommend EC2 if you want to pre-generate large numbers of tiles (say a continent down to z16), copy them somewhere and then switch off the renderer
  • I’d consider EC2 for ultra-large setups where you are running 5 or more tileservers already, but only as additional-load machines
  • I wouldn’t recommend EC2 if you want to run an on-the-fly tileserver. Which is what most people want to do.

Any thoughts? Running a tileserver on EC2 and disagree? Let me know below.

11 thoughts on “Map rendering on EC2

  1. Igor Brejc

    Great post. I was looking for such an info for a long time. Looking forward to your next posts on your experiences with running WMS.

  2. Andy

    Hi Jekader – It’s not really, since it just uses one EC2 instance, so it’s not really “cloud-based” from a scaling point of view it just happens to be hosted on EC2. But it needs serious connectivity, now that it’s delievering tiles at around 30-35Mb/s during the day!

  3. Tom Taylor

    I have a similar experience with EC2 – the disks aren’t great. I recommend Linode though – notably quicker disks, and they have a London datacentre too.

  4. Pingback: igorbrejc.net » Fresh Catch For July 7th

  5. Martin

    Hi
    interesting post. I don’t know EC2 details but why don’t you use a server with enough Memory to hold the 10 GB in RAM?

  6. Andy

    The 10GB is the compressed XML document, it comes out as a few hundred gigabytes when converted into linestrings etc for rendering. It would be nice to have 256Gb+ of RAM, but most people are trying to get 256GB+ of SSD instead.

  7. David Heath

    I worked on a project using EC2. We had a consultant in to set up the system as I wasn’t familiar with it at the time. He used multiple elastic block store instances mounted as a striped raid array to gain higher disk I/O performance. I’m afraid I didn’t understand all the details, but it sounded like a good way to achieve better disk IO performance.

  8. Andy

    Yeah, the EBS is actually faster than the volatile local disks, which doesn’t make a huge amount of intuitive sense to me, but hey! It’s the added expense that starts eating into this – for a well-specced EC2 machine with plenty of EBS attached, you could get a blazing dedicated server at many different hosts, usually with monthly contracts. It doesn’t take a dedicated server to be much cheaper per day before the additional termination period (which averages only 15 days, if you think about it) becomes irrelevant.

  9. Phil

    Great post,
    I do plan to generate tiles for entire world @ Zoom level 12 (around 2.23 TB) and I am wondering if it already exists a script to automatize the generation.
    (my first idea is to use several EC2 instances in charge of generating the tiles and then ‘pushing’ those PNGs to S3 )

Comments are closed.