Can OpenStreetMap run on Heroku?

It’s been a long-term goal of mine to see if we can get the OpenStreetMap Website and API running on Heroku.

So can we? In short – mostly, but it’s still got some rough edges.

But why?

OSMF already has the website and API up and running, obviously, and it needs some pretty specialist hardware for the databases. There are no plans whatsoever to move all that to Heroku, so at first glance it seems rather pointless for me to investigate if it actually works. So why am I spending my time on this?

Firstly, it’s useful for other people to have their own working copies of the website. They can deploy the same code, but use their copy to map historic features that aren’t relevant in OpenStreetMap. Or they can use it to make collaborative fantasy maps, or maps for proposed local developments, or other projects that aren’t suitable for putting in the main database.

Secondly, it’s also useful for developers. We already have a system on the OSMF development server for testing branches, which allows non-developers to check how proposed features work, debug them and provide feedback on them. But if you have a branch to test you need to ask the sysadmins, and wait for them to deploy it for you, and you don’t get full control over the instance if you want to debug it or to dive into the database.

But mainly, I’m doing this because Heroku is a useful yardstick. Even if you aren’t interested in using Heroku itself, it’s perhaps the best example of a modern Rails hosting environment. If we can make it easy to deploy on Heroku, we’ll have made it easy to deploy in all manner of different deployment situations, whether that’s in-house in a controlled corporate environment, or using cloud native builder images for Kubernetes, or something else. So I think it’s worth using Heroku to find out where the kinks are in deploying our code, and then to iron those kinks out.

Some ironing is required

In no particular order, here are some of the problems I came across today:

  • Our repository is huge. It’s about 380M and so it takes a long time to push to Heroku for the first time. What’s worse, is that if there’s an error during the first deployment, Heroku doesn’t cache the git repository, and so you need to upload it again (and again). I stumbled across a way to push the code without triggering a build, which saved me heaps of time later on as I fixed other build errors.

  • libarchive isn’t available on Heroku. We use the ffi-libarchive gem as part of our trace processing, but the underlying library is missing and this stops the build process. I worked around it by removing the gem, but a better approach would involve adding Heroku’s apt builder image.

  • Missing settings.local.yml file. There’s a bug about this already, and we’ve added a workaround in all of our install notes, travis configs and so on, but it rears its head again here.

  • Production compilation of i18n-js assets. In our Configuration Guide we mention the need to compile the translation files for the javascript-based internationalisation, but it’s not clear to me how to do run that stage at the right time on Heroku. It needs to be done either before, or as part of, the assets:precompile that Heroku runs automatically. As a workaround, I just committed the output to the git repo before pushing to Heroku.

  • No ActiveStorage configuration file. This is the nexus of configuration files, twelve-factor, and deployment-agnostic development. I could write a whole essay on this but simply committing config/storage.yml was a good enough workaround for today.

  • Needs a Procfile. Heroku introduced the concept of Procfiles, which lists what needs to run to make the whole app work. Here’s a Procfile that works for us:

    web: bundle exec rails server -p $PORT
    worker: bundle exec rake jobs:work
    

    This tells Heroku that we want a webserver, but that we also need a worker process to import traces, and send emails asynchronously. Heroku then runs both processes automatically, along with detecting and configuring a database for us automatically too.

  • No ActiveStorage for traces. We have our trace import system hardcoded to use local file paths, but on Heroku those are ephemeral. In future we’ll move trace importing to using ActiveStorage, but there’s a couple of blockers on that for now.

So a few minor hiccups, but nothing particularly insurmountable. I was pleasantly surprised! While we work on resolving some of these problems, there’s a whole load of things that already work.

How we got here

I would describe getting to this stage as a useful “side effect” of lots of previous work. The most recent was replacing the compiled database functions with SQL equivalents, which are significantly easier to handle when you don’t control the database server. Previous work on using ActiveJob for trace processing makes the worker dyno setup possible, and moving the quad_tile code to an external gem made this part of the installation seamless.

Did I mention the price?

So how much does deploying your own copy of the OpenStreetMap website on Heroku cost? Nothing. Zero dollars per month. Heroku gives you 1 web and 1 worker dyno for free on each application that you deploy, and that’s all we need. It also offers a free database tier too, which is enough to get started. Of course these have limitations, and you’ll need to pay for some S3 storage if you want user images (and traces, in the future). But I think it’s worth pointing out that you can spin up your own deployment without having to pay, and dive into testing those new features or creating your own OSM data playground. I’m sure that’ll be useful to many people.

Smoother API upgrades for OpenStreetMap

Recently, my coding attention has drifted towards the OpenStreetMap API. The venerable version 0.6 API celebrated its 10th birthday earlier this year, so it’s had a good run – clocking up double the age of all previous versions combined!

Years ago, whenever we wanted to change the API, we gathered all the OSM developers in one room for a hack weekend, and discussed and coded the changes required there and then. For the most recent change, from 0.5 to 0.6, we had to do some database processing in order to create changesets, and so we simply switched off the API for a long weekend. Nobody could do any mapping. And everyone had to upgrade all their software that same weekend, since anything that used the 0.5 API simply stopped working.

In short, I don’t think we can use this approach next time! Moreover, I think this ‘big bang’ approach to making API changes is actually the main reason that we’ve stuck with API version 0.6 for so long. Sure, it works, and it’s clearly ‘good enough’. But it can be better. Yet with such a high barrier to making any changes, it’s not surprising that we’ve got ourselves a little bit stuck on this version.

So I’ve started work on refactoring the codebase so that we can support multiple API versions at the same time. There’s a bunch of backwards-incompatible changes that we want to make to the API, but since those are not fundamentally changing the concepts of OSM, it makes sense to run version 0.6 and 0.7 at the same time. This gives us a smooth transition period, and allows all the applications and tools that use the current API version a chance to upgrade in their own time.

SotM 2019 was in Heidelberg last month, and among other things gave me a chance to talk face-to-face with lead developers from four different OpenStreetMap editors. The editors are the key pieces of software that use the API, so this was a rare chance to check in and find out what changes they would like to see. I spoke with developers of Vespucci, iD, JOSM – and of course Level0! My task now is to figure out which of their requests and suggestions are blocked by having to break API compatibility, or which ones can be implemented in the current API version.

One example of a breaking change is moving to structured error responses, a topic Simon Poole raised with me straight away. Often the API returns a human-readable error message when there’s a problem. For example, during a changeset upload, the API could return “Precondition failed: Way 1234 is still used by relations 45,3557,357537”. This isn’t particularly handy, since editors then need a bunch of regular expressions to try to match each error message, and to carefully parse the element ids out of those human readable sentences. Changing these into structured responses would be more useful for developers, but that requires a version increment.

Another change that I am particularly keen to see, but perhaps not so keen to code, is to make diff uploads more robust. They currently suffer from two main problems. The first is that the diff can be large and therefore complex to parse and apply to the database, and so it can take a long time to receive a response from the API. For mobile editors, in particular, it can be hard to maintain an open connection to the server for long enough to hear back whether it was successfully applied to the database. So I’d like to move to a send-and-poll model, where the API immediately acknowledges receipt of the diff upload, and gives the client a URL that it can poll to check on progress and to receive the results. That way if there’s a hiccup on a cellular connection, the editor can just poll again on a fresh connection.

Relatedly, the second problem is what the editor should do if it never hears back from the server after sending a diff upload. If the mapper has added a hundred new buildings, presses save, and the editor gets no response from the API, should the software try again? If it doesn’t, then the mapper looses all their work. If it tries the upload again (and again?), it could be creating duplicate map data each time. It’s impossible to know what happened – did the request get lost on the way to the server, or did the server receive it, save it, and it was the response that got lost? My proposal is to include an idempotency key in the diff upload. The server will store these, and if it sees the same upload key for a second (or third) time, it will realise the response has been lost somewhere, and no harm is done. A similar approach can then be taken with other uploads, like creating notes or adding comments, to allow safe retries for those too.

There are many other upgrades that I’d like to make to the API, focussed on a broader strategy of making life easier for editor developers. But without the ability to run multiple versions in parallel, none of these changes are likely to happen. So that’s my first priority.

Better Trace Uploads, New API Call and More i18n for OpenStreetMap

June was a busy month for my ongoing development work on OpenStreetMap.

The highlight of the month for me was that we wrapped up the project to move GPX upload processing to the built-in job queue system. This means that we can run the GPX processing jobs in parallel, and spread the tasks between different machines instead of than being limited to one process on one machine. This makes it much less likely that your uploaded trace will get stuck at the back of a queue.

It also means that the notification emails that you receive will be translated into your preferred language; it makes it much easier to test the processing or add new features; and we can get rid of some old code that nobody was looking after.

The key piece of work to get this released was being able to create the trace animations from within our ruby code. mmd-osm worked with the developer of the gd2-ffij rubygem to add support for animated gifs. That was the last piece of the puzzle, and when a new version of the gem was released, we were good to go.

Towards the end of the month I developed a new ‘api/versions’ API call. This allows applications an easier way to find out what versions of the API the server supports, rather than using the unversioned ‘api/capabilites’ call (the ‘api/0.6/capabilities’ call is what applications should use for checking capabilities, since the capabilities could change between different API versions in future). It seems almost completely unimportant at the moment, since we’ve been on version 0.6 for over 10 years now, but it’s solving another piece of the jigsaw that will allow us to run version 0.7 (and perhaps future versions) alongside an existing API version, which we’ve never done before. There’ll be more work from me to support API 0.7 over the next few months.

Internationalisation (i18n) is a topic that I keep working on, since we’ve got thousands of mappers who don’t speak English, and problems with the i18n system can be really jarring for them when using the site. One feature we use all over the place is describing things that occurred in the past, and previously it wasn’t possible to accurate translate times like “3 days ago” into all languages. I’ve now added a mechanism to make this possible, and it’s a solution that I might take upstream for wider use in other rails projects.

I also rolled up my sleeves and tackled some of the endless and thankless issue pruning! We still have over 450 open requests between two different issue trackers. Some of the issues are important, but are lost in the pile. But many are feature requests that are either quite niche, or ideas that seem reasonable but that realistically we aren’t likely to ever get around to doing. I find these the hardest to deal with, since it’s hard to close them without being unwelcoming. Yet the variety of small features that could theoretically be added is endless, and the project doesn’t gain anything from having an endless list of feature requests in the queue. So I try to deal with them as politely as I can.

Finally, some more refactoring. This time some minor changes, including the way we create forms and render partials. It doesn’t make the website run faster, but it’s less code for developers to read, and means that new developers are likely to see better existing code and make new features more easily. I try to keep creating a better and more enjoyable ‘developer experience’, particularly for new developers who might not be familiar with the tools, in the hope we can attract and retain some more developers.

That’s all for now!

Moderation, Authorisation and Background Task Improvements for OpenStreetMap

We’re making steady progress with improving the OpenStreetMap codebase. Although it’s been a busy year for me, I managed to wrap up the moderation pull request, and it was merged and deployed in mid June. Lots of different people have worked on this over the years, starting way back as a GSoC project in 2015! It was great to finally get it integrated into the site.

Last week mavl, one of our site moderators, posted that nearly a thousand issues were logged by OSM volunteers in the first 3 months – at an average rate of over 10 per day! That’s a lot more than I was expecting, so future development ideas could involve working with the OpenStreetMap moderators to support their workflows.

Also in June, the Ruby for Good team picked us as one of their projects for their annual development event. A bunch of good stuff came out of that, in particular a new “quad_tile” gem, which builds one of our C extensions automatically during gem installation, and therefore saves new developers from having to deal with that.

In addition, they also kick-started work on our new authorisation framework, based on CanCanCan, which I outlined previously. It’s taken longer than I would have liked to get that ready, but the code was merged and deployed last week, and I’ve started refactoring the rest of our controllers to use it. This will allow us to remove a lot of home-grown authorisation code and use a standard approach instead, which means less custom code to maintain as well as being more familiar to new developers. I see this authorisation framework as a key enabler for future projects – we’ll be able to develop new features faster, while making sure that only the right people have access.

Another recent upgrade has been to set up and start using the Active Job framework. This allows tasks to be run in the background, and is now a standard part of Ruby on Rails. We’re already using it for sending notification emails, so now if you post a comment to a popular diary entry, you don’t need to wait while the system sends dozens of notification emails to all the subscribers – they are queued and dealt with separately. A small improvement, perhaps, but the job framework will really come alive when we start using it for processing GPX trace uploads. I hope to have more news on that soon.

If you want to follow our progress more closely, or get involved in the development, head over to our GitHub repository for full installation and development guides, along with all the issues and pull requests that we are dealing with!

Groundwork for new features in OpenStreetMap

Last summer I finished a large refactoring, and thought it would be nice to change tack, and so I decided to try to push through a new feature as my next project. Refactoring is worthwhile, but it has a long-term pay-off. On the other hand, new features show progress to a wider audience, and so new features are another avenue to getting people interested and involved in development.

I picked an old Google Summer of Code project that hadn’t really been wrapped up, and immediately spotted a bunch of changes that would be needed to make it easier for others to help review it. Long story short, it needed a lot more work than anyone realised and it’s taken me a few months to get it ready. I’ve learned a few lessons about GSoC projects along the way, but that’s a story for another time.

I want to keep going with the refactoring, since a better codebase leads to happier developers and eventually to better features. But it’s worthwhile having some sort of a goal, otherwise it’s hard to decide what’s important to refactor, and to avoid getting lost in the weeds. There have been discussions in the past about adding some form of Groups to OpenStreetMap, and it’s a topic that keeps on coming up. But I know that if anyone tried implementing Groups on top of our current codebase, it would be impossible to maintain, and it’s far too big a challenge for a self-contained project like GSoC.

So what things do I think would make it easier to implement Groups? The most obvious piece of groundwork is a proper authorisation framework. Without that, deciding who can view messages in or add members to each group would be gnarly. I also don’t want to add many more new features to the site with our “default allow” permissions – it’s too easy to get that wrong, particularly adding something substantial and complex like Groups.

I had a stab at adding the authorisation framework a few weeks ago, but quickly realised some more groundwork would help. We can make life easier for defining the permissions if we use standard Rails resource routing in more places. However, that involves refactoring controller methods and renaming various files. That refactoring becomes easier if we use standard Rails link helpers, and if we use shortened internationalization lookups.

So there’s some more groundwork to do before the groundwork before the groundwork…

Factory Refactoring – Done!

After over 50 Pull Requests spread over the last 9 months, I’ve finally finished refactoring the openstreetmap-website test suite to use factories instead of fixtures. Time for a celebration!

As I’ve discussed in previous posts, the openstreetmap-website codebase powers the main OpenStreetMap website and the map editing API. The test suite has traditionally only used fixtures, where all test data was preloaded into the database and the same data used for every test. One drawback of this approach is that any change to the fixtures can have knock-on effects on other tests. For example, adding another diary entry to the fixtures could break a different test which expects a particular number of diary entries to be found by a search query. There are also more subtle problems, including the lack of clear intent in the tests. When you read a test that asserts that a given Node or Way is found, it was often not clear which attributes of the fixture were important – perhaps that Node belonged to a particular Changeset, or had a particular timestamp, or was in a particular location, or a mixture of other attributes. Figuring out these hidden intents for each test was often a major source of the refactoring effort – and there’s 1080 tests in the suite, with more than 325,000 total assertions!

Changing to factories has made the tests independent of each other. Now every test starts with a blank database, and only the objects that are needed are created, using the factories. This means that tests can more easily create a particular database record for the test at hand, without interfering with other tests. The intent of the test is often clearer too, since creating objects from factories happens in the test itself and are therefore explicit about what attributes are the important ones.

An example of the benefits of factories was when I fixed a bug around encoding of diary entry titles in our RSS feeds. I easily created a diary entry with a specific and unusual title, without interfering with any other tests or having to create yet another fixture.

def test_rss_character_escaping
  create(:diary_entry, :title => "<script>")
  get :rss, :format => :rss

  assert_match "<title>&lt;script&gt;</title>", response.body
end

All in all, this took much, much longer than I was expecting! Looking back, I feel I might have picked a different task, but I’m glad that it’s all done now. I’m certainly glad that I won’t have to do it all again! Moving on, it’s now easier for me and the other developers to write robust tests, and this will help us implement new features more quickly.

Big thanks go to Tom Hughes for reviewing all 51 individual pull requests! Thanks also to everyone who lent me their encouragement.

So the big question is – what will I work on next?

Steady progress on the OpenStreetMap Website

Time for a short status update on my work on the openstreetmap-website codebase. It’s been a few months since I started refactoring the tests and the work rumbles on. A few of my recent coding opportunities have been taken up with other projects, including the blogs aggregator, the 2017 budget for the OSMF Operations Working Group (OWG), and the new OWG website.

With the fixtures refactoring I’ve already tackled the low-hanging fruit. So now I’m forced to tackle the big one – converting the Users fixtures. The User model is unsurprisingly used in most tests for the website, so the conversion is quite time-consuming and I’ve had to break this down into multiple stages. However, when this bit of the work is complete most future Pull Requests on other topics can be submitted without having to use any fixtures at all. The nodes/ways/relations tests will then be the main thing remaining for conversion, but since the code that deals with those changes infrequently, it’s best to work on the User factories first.

As I’ve been working on replacing the fixtures, I’ve come across a bunch of other things I want to change. But before tackling all that I’m going to mix it around a bit. My goal is to alternate between the work I think is the most important, and also helping other developers with their own work. We have around 40 outstanding pull requests and some need a hand to complete. There are plenty of straightforward coding fixes among the 250 open issues that I can work on too. I hope that if more of the issues and particularly the pull requests are completed, this will motivate some more people to get involved in development.

If you have any thoughts on what I should be prioritising – particularly if you’ve got an outstanding pull request of your own – then let me know in the comments!

Upgrading the OpenStreetMap Blogs Aggregator

One of my projects over the winter has been upgrading the blogs.openstreetmap.org feed aggregator. This site collects OpenStreetMap-themed posts from a large number of different blogs and shows them all in one place. The old version of the site was certainly showing its age. The software that powered it, called PlanetPlanet hasn’t been updated for over 10 years, and can’t cope with feeds served using https, so an increasing number of blogs were disappearing from the site. Time for an upgrade.

My larger goal was moving the administration of the site into the open, in order to get more people involved. Shaun has maintained the old system for many years and has done a great job. Unfortunately there were tasks that could only be done by him, such as adding new feeds, removing old feeds, and customising the site. To make any changes you had to know who to ask, and hope that Shaun had time to work on it for you. It was also unclear what criteria a blog feed had to meet to be added, and even more so, if and when a blog should be removed.

The challenge was therefore to move the configuration to a public repository, move the deployment to OSMF hardware, create a clear policy for managing the feeds, and thereby reduce the barriers to getting involved all round.

Thankfully, other people did almost all of the work! After I investigated the different feed aggregation software options – most of them are barely maintained nowadays – I reckoned that Pluto was the best choice. It turns out Shaun had previously come to the same conclusion, and had almost completed the migration himself. Then Tom put together the Chef deployment configuration, which was the second half of the work. So all I had to do was finish converting the list of feeds, make a few template changes, and everything was ready to go. After a few weeks delay while the sysadmins worked on other tasks, the new site went live on Friday.

If you know of a blog that should be added to the site, please check our new guidelines. If you have any changes you want to make to the site, the deployment, or the software that powers it, now is you chance to get involved!

Refreshing the OpenStreetMap Codebase

The codebase that powers OpenStreetMap is older than any other Rails project that I work on. The first commit was in July 2006, and even then, that was just a port of an existing pre-rails system (hence why you might still see it referred to as “The Rails Port”)

It’s a solid, well-tested and battle-hardened codebase. It’s frequently updated too, particularly its dependencies. But if you know where to look, you can see its age. We have very few enduring contributors, with is surprising given its key position within the larger OpenStreetMap development community. So I’ve been taking a look to learn what I can do to help.

OpenStreetMap-website Contributions

For someone just getting started with Rails, they’ll find that many parts of our code don’t match what’s in any of the books they read, or any the guides online. More experienced developers will spot a lot of things that were written years ago, and would nowadays been done differently. And for some of our developers, particularly our Summer of Code students, they are learning what to do by reading our existing code, so our idiosyncrasies accumulate.

I started trying to fix a minor bug with the diary entries and a bunch of things struck me as needing a thorough refresh. I started the process of refactoring the tests to use factories instead of fixtures – 2 down, 37 to go. I’ve started rewriting the controllers to use the standard rails CRUD method names. And I’ve made a list of plenty of other things that I’d like to tackle, all of which will help lower the barrier for new (and experienced) developers who want to get stuck into the openstreetmap-website code.

I hope that progress will snowball – as it becomes easier to contribute, more people will join in, and in turn help make it even easier to contribute.

But it’s time-consuming. I need help to share these projects around. If you’re interested, please get stuck in!

Getting Involved in the Operations Working Group

For the last few years I’ve been trying to get more people involved in the Operations Working Group – the team within the OpenStreetMap Foundation that runs all of our services. Each time I think “why aren’t more people involved”, I try to figure out some plausible barriers to entry, and then work on fixing them.

One of the reasons is that we’re very quiet about what we do – there’s a pride in keeping OpenStreetMap humming along and not causing too much of a fuss. But the lack of publicity hurts us when we’re trying to get more people involved. Hence this blog post, among other reasons.

I’ve been working recently (as in, for the last few years) on making as much of our OWG activities public as possible, rather than hidden away on our mailing list or in our meetings. So we now have a public issue tracker showing our tasks, and we also publish a monthly summary of our activities.

To make OpenStreetMap work, we run a surprisingly large number of servers. For many years we maintained a list of our hardware and what each server was being used for on the OpenStreetMap wiki, which helped new people find out how everything works. Maintaining this information was a lot of work and the wiki was often outdated. For my own projects I use Jekyll to create static websites (which are easier to find hosting for than websites that need databases), and information on Jekyll sites can be generated with the help of data files. Since OWG uses Chef to configure all of our servers, and Chef knows both the hardware configuration and also what the machines are used for, the idea came that we could automate these server information pages entirely. That website is now live on hardware.openstreetmap.org so we have a public, accurate and timely list of all of our hardware and the services running on it.

Now my attention has moved to our Chef configuration. Although the configuration has been public for years, currently the barrier to entry is substantial. One straightforward (but surprisingly time-consuming) improvement was to simply write a README for each cookbook – 77 in all. I finished that project last week.

Unless you have administrator rights to OSMF hardware (and even I don’t have that!) you need to write the chef configuration ‘blind’ – that is, you can propose a change but you can’t realistically test that it works before you make a pull request. That makes proposing changes close to impossible, so it’s not surprising that few non-administrators have ever contributed changes. I have experience with a few tools that can help, the most important being test-kitchen. This allows a developer to locally check that their changes work, and have the desired effect, before making a pull request, and also it allows the administrators to check that the PR works before deploying the updated cookbook. Both Matt and I have been working on this recently, and today I proposed a basic test-kitchen configuration.

This will only be the start of a long process, since eventually most of those 77 cookbooks will need test-kitchen configurations. Even in my initial attempts to test the serverinfo cookbook (that generates hardware.openstreetmap.org) I found a bunch of problems, some of which I haven’t yet figured out how to work around. There will be many more of these niggles found, but the goal is to allow future developers to improve each cookbook using only their own laptops.

All of these are small steps on the long path to getting more people involved in the Operations Working Group. If you’re interested in helping, get stuck in to both our task list and our chef repo, and let me know when you get stuck.