Digital Finds

joseph dot reeves at thehumanjourney dot net
@iknowjoseph

Pubs: Extending a completeness survey

Dec 04, 2009 by Joseph Reeves

In my last blog post I counted up the number of pubs provided within CloudMade's (25/11/09) OSM export and compared them to the number of pubs we're told that exist within the UK; in short, about 38% of UK pubs are in OpenStreetMap. I got some really great comments on that last entry [1] and was looking forward to the next release from CloudMade to see the rate that UK pubs were (hopefully) growing.

Gregory Williams posted some interesting numbers drawn directly from the database, rather than my CloudMade numbers. He looked for pubs and bars and found 22523 and 350 respectively; I was hoping that this number represented a big increase in recorded pubs, perhaps indicative of the rate at which the UK is being mapped. I was wrong, however, as the release of CloudMade's 02/12/09 dataset revealed:

joseph@joseph-work:~$ grep -ci 'Pub:' united_kingdom_Eating_Drinking.gpx
20742
joseph@joseph-work:~$ grep -i 'Pub:' united_kingdom_Eating_Drinking.gpx | grep -ci 'closed'
33

(20 742 / 53 466) * 100 = 38.7947481

Whilst the number of pubs recorded in the UK has risen (as with the number of pubs recorded with the word "closed" in their name), the running total still stands at c.38% of the Guardian's figure. It seems then, that Gregory Williams' SQL query was pulling out 2000 pubs from the database that CloudMade doesn't. I think that's quite interesting and possibly something worth looking into. Gregory also notes that 20.7% of UK postboxes are recorded on OSM; I guess we can say that mappers prefer pubs to posting.

This time around I've taken the CloudMade export and plotted the position of OpenStreetMap's record of UK pubs as a simple map:


Click for the large version.

Gary Jones, our GIS officer, gave me a good run-through QGIS 1.3.0; I imported the CloudMade gpx file, pulled out the pubs and plotted them on top of the coastline export. The pubs as a shape file are available here under the same CC-A-SA-2.0 license as the image above. Apologies for the lack of coastline for the Isle of Man in my image, I tried to add it, but it crashed QGIS and seemed to wreck my saved project file, so I left it be. If anyone else creates any other pub maps, please let me know.

There are certainly gaps in the map above! Is that because there's missing pubs or simply areas without pubs? I guess that's the point of a completeness survey! I can say that there are gaps on the image within which I have certainly enjoyed a pint. Presumably the c.62% of UK pubs that don't appear on the map above are located in both areas that are completely blank and also within partially completed regions. The UK is too big an area to conduct meaningful completeness surveys on; we need official pub number and OpenStreetMap counts for a smaller selection of areas. Perhaps county would be an appropriate scale to start with?

Something to mull over on a dark and cold evening...

[1] If you comment on my blog please be aware that long posts may get wrongly marked as spam. This is a massive pain in the backside, but don't worry about it; I will see your comment and mark it as not-spam and it'll appear as it should do. You can safely ignore any warnings you get.



Comments:

This should reassure you a bit more:
grep -ci '<name>Pub' united_kingdom_Eating_Drinking.gpx
Gives 21603 off of the same 02-Dec CloudMade POI file and
grep -i '<name>Pub' united_kingdom_Eating_Drinking.gpx | grep -civ 'closed'
yields 21570, when excluding the closed ones.

The subtlety is that the pubs without a name (admittedly which need to be fixed) weren't being picked up by your filters.

For a bit of fun here's something to look at the frequency of the various pub names:
select
regexp_replace(regexp_replace(name, '^The ',''),' and ',' & ','i') pub_name,
count(*)
from planet_osm_point
where amenity = 'pub'
and name <> ''
and way && transform(GeometryFromText('POLYGON((-6.877441 49.181703,-2.307129 50.120578,0.834961 50.722547,1.977539 51.165567,2.241211 52.816043,0.263672 61.386198,-7.822266 63.045001,-12.370605 56.438204,-6.547852 55.416544,-8.129883 54.41893,-7.316895 54.123822,-6.943359 54.380557,-5.712891 53.826597,-5.646973 52.442618,-7.822266 50.078295,-6.965332 49.267805,-6.877441 49.181703))', 4326),900913)
group by pub_name
order by count(*) desc

This massages the name a little to remove and superfluous leading "The " and converts all " and "s to " & " such that they're considered to be the same name.

As expected "Red Lion" comes top with 282. This is followed by 223 "Royal Oak"s, 180 "Crown"s, 137 "White Horse"s, 136 "White Hart"s, 128 "Plough"s, 125 "Rose & Crown"s, etc.

It may be an interesting exercise to look at the regional variation in names. For example I wouldn't expect many "George"s or "George & Dragon"s outside of England, and I wonder how many inland "Ship"s or "Ship Inn"s there may be?

Posted by Gregory Williams on December 04, 2009 at 06:35 PM GMT #

I think your metrics are completely wrong. A typical entry looks like this>

<wpt lat="51.012692500" lon="5.902214400">
<name>Pub:Gaststätte Jütten</name>
<cmt>Pub:Gaststätte Jütten</cmt>
<desc>Pub:Gaststätte Jütten</desc>

You are overestimating by a factor of 3!!!

Use:

grep -ci '<name>Pub:' united_kingdom_Eating_Drinking.gpx

Posted by blubb on December 05, 2009 at 12:30 PM GMT #

Joseph actually did the division by 3 in the figures that he presented, but he just didn't mention that.

Posted by Gregory Williams on December 07, 2009 at 10:03 AM GMT #

Thanks Gregory,

To be honest I didn't devide by 3 but used a string something like Blubb provided; I must have made a mistake when I was writing it up. Either way, the figure still stands at c.38%

I'll update this friday with the grep strings you mention above and the correct commands copied in.

Looking foward to seeing how many pubs get recorded this week!

Cheers all, Joseph

Posted by Joseph Reeves on December 07, 2009 at 03:34 PM GMT #

Hello, Joseph. Really nice map :) I tried to download the zip file and found that there is no DBF file in it, so at least gvSIG won't open it. Regards,
Juan Lucas

Posted by Juan Lucas on December 20, 2009 at 10:44 PM GMT #

Post a Comment:
Comments are closed for this entry.