Pubs: An OpenStreetMap completeness survey

Nov 29, 2009 by Joseph Reeves

OpenStreetMap completeness is something that gets spoken about a lot at OA; we already use OSM data in places, but there are some who feel uneasy about the data when it is provided without any indication of its completeness or quality. I think sometimes these questions can be a little unkind; they're rarely asked of Google, for example, and it often falls upon the OpenStreetMap crowd to highlight inaccuracies in this ubiquitous mapping resource. Early last year I noticed some myself.

Despite my moanings, completeness studies are, however, very important. One of the beauties of OpenStreetMap is that you can download all the data yourself, making these studies uniquely possible; the only country you could do similar analysis for based on Google Maps data is Kenya. Luckily, people are looking at completeness of the map, Muki Haklay, for example, has released comparisons of OSM data showing completeness in March 2008 and October 2009.

Bored yesterday, and stuck inside with a cold, I decided to see if I could come up with my own completeness metric. Muki Haklay's example was concerned with roads, I wanted to try something based on PoIs.

The Guardian provides via their excellent DataBlog the number of pubs in the UK: 53,466. From CloudMade we are able to download a gpx file containing a mention of every restaurant, pub & takeaway within the UK recorded on OpenStreetMap. We can very easily pull out the number of pubs and come up with the percentage figure demonstrating the completeness of pub recording in the UK:

joseph@joseph-work:~$ grep -ci 'Pub:' united_kingdom_Eating_Drinking.gpx

(20 620 / 53 466) * 100 = 38.5665657

Let's round that up and say that 38.57% of the pubs in the UK are recorded on OpenStreetMap.

Whilst we can't be sure of the quality of the data provided by the Guardian, we can look a little closer at the OSM data. Some of it, for example, is pretty bad, other pubs have had the word "closed" added to their names, not something you probably need on a map:

joseph@joseph-work:~$ grep -i 'Pub:' united_kingdom_Eating_Drinking.gpx | grep -ci 'closed'

Regardless, these numbers are small so the 38.5% figure still likely rings true enough. Some might argue that it's a small number, but the level of contributors to OpenStreetMap continues to rise in a very positive rate, as such I'm sure that if this was re-run towards the end of 2010 we'd see a big improvement.

If you're reading this and don't know if your local is on OpenStreetMap, why not take a look; once you've got an account you only need to click the edit button to add it.


Great to see this posting. I was thinking of doing something similar recently, but taking the Beer in the Evening site as a baseline. They list 37164 pubs, so still not up to the level of the Guardian's list. With the CloudMade data, do they include the 'bar' amenities in with the pubs as well?

It'd be great to compare on a per-region basis to see where would be a good place to hold a pub mapping party :)

Posted by Dan Karran on November 30, 2009 at 01:52 PM GMT #

Good to see that amenities also have a progress counter. You are lucky in the UK that the road network coverage in OSM is so good, so people starts to look for other types of data missing in the database. I am mapping a country that still are missing a lot, big areas of nothing (though we probably have one of the largest nothings in the world too), so mappers here are still not going out of the way to find a dew fresh one. But here too recruitment are progressing, every week we see new names on our mailing list, and contributions to the map ever moves us towards the enternal goal of a complete map of the world.

Posted by Aun Johnsen on December 01, 2009 at 08:46 PM GMT #

Here's another progress indicator -- postboxes:

At the moment we have 24072 UK postboxes in the OSM database, of which 11873 have a reference number. Tom Taylor's Freedom of Information Act request yielded a list of 116088 for the UK, so I make that 20.7% and 10.2% respectively.

The coverage varies quite widely across the UK though. E.g. I have 80% of the CT prefix mapped and CO has almost 50% whereas, for example, CF hasn't got any referenced postboxes yet.

Posted by Gregory Williams on December 03, 2009 at 10:58 AM GMT #

Here's the SQL that I used (note that the UK polygon is quite crude):
count(*) total,
count(case when ref is not null then 1 end) referenced
from planet_osm_point
where amenity = 'post_box'
and way && transform(GeometryFromText('POLYGON((-6.877441 49.181703,-2.307129 50.120578,0.834961 50.722547,1.977539 51.165567,2.241211 52.816043,0.263672 61.386198,-7.822266 63.045001,-12.370605 56.438204,-6.547852 55.416544,-8.129883 54.41893,-7.316895 54.123822,-6.943359 54.380557,-5.712891 53.826597,-5.646973 52.442618,-7.822266 50.078295,-6.965332 49.267805,-6.877441 49.181703))', 4326),900913)

Posted by Gregory Williams on December 03, 2009 at 10:59 AM GMT #

Hmmm, just tried to get the pub count by using my above query with amenity = 'pub' and amenity = 'bar'. These yield 22523 and 350 respectively. Filtering out those with "closed" somewhere in the name yields 21561 and 318. So, admittedly a slightly different way of measuring it, but q pretty reasonable improvement in numbers since the Cloudmade POI snapshot a few days ago.

Posted by Gregory Williams on December 03, 2009 at 11:25 AM GMT #

